blob: 221f34e14e1ba79207b9c877afa509d79231c83b [file] [log] [blame] [view]
---
layout: page
title: "R Interpreter for Apache Zeppelin"
description: "R is a free software environment for statistical computing and graphics."
group: interpreter
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{% include JB/setup %}
# R Interpreter for Apache Zeppelin
<div id="toc"></div>
## Overview
[R](https://www.r-project.org) is a free software environment for statistical computing and graphics.
To run R code and visualize plots in Apache Zeppelin, you will need R on your zeppelin server node (or your dev laptop).
+ For Centos: `yum install R R-devel libcurl-devel openssl-devel`
+ For Ubuntu: `apt-get install r-base`
Validate your installation with a simple R command:
```
R -e "print(1+1)"
```
To enjoy plots, install additional libraries with:
+ devtools with
```bash
R -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
```
+ knitr with
```bash
R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org')"
```
+ ggplot2 with
```bash
R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
```
+ Other visualization libraries:
```bash
R -e "install.packages(c('devtools','mplot', 'googleVis'), repos = 'http://cran.us.r-project.org');
require(devtools); install_github('ramnathv/rCharts')"
```
We recommend you to also install the following optional R libraries for happy data analytics:
+ glmnet
+ pROC
+ data.table
+ caret
+ sqldf
+ wordcloud
## Supported Interpreters
Zeppelin supports R language in 3 interpreters
<table class="table-configuration">
<tr>
<th>Name</th>
<th>Class</th>
<th>Description</th>
</tr>
<tr>
<td>%r.r</td>
<td>RInterpreter</td>
<td>Vanilla r interpreter, with least dependencies, only R environment and knitr are required.
It is always recommended to use the fully qualified interpreter name <code>%r.r</code>, because <code>%r</code> is ambiguous,
it could mean <code>%spark.r</code> when current note's default interpreter is <code>%spark</code> and <code>%r.r</code> when the default interpreter is <code>%r</code></td>
</tr>
<tr>
<td>%r.ir</td>
<td>IRInterpreter</td>
<td>Provide more fancy R runtime via [IRKernel](https://github.com/IRkernel/IRkernel), almost the same experience like using R in Jupyter. It requires more things, but is the recommended interpreter for using R in Zeppelin.</td>
</tr>
<tr>
<td>%r.shiny</td>
<td>ShinyInterpreter</td>
<td>Run Shiny app in Zeppelin</td>
</tr>
</table>
If you want to use R with Spark, it is almost the same via `%spark.r`, `%spark.ir` & `%spark.shiny` . You can refer Spark interpreter docs for more details.
## Configuration
<table class="table-configuration">
<tr>
<th>Property</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr>
<td>zeppelin.R.cmd</td>
<td>R</td>
<td>Path of the installed R binary. You should set this property explicitly if R is not in your <code>$PATH</code>(example: /usr/bin/R).
</td>
</tr>
<tr>
<td>zeppelin.R.knitr</td>
<td>true</td>
<td>Whether to use knitr or not. It is recommended to install [knitr](https://yihui.org/knitr/)</td>
</tr>
<tr>
<td>zeppelin.R.image.width</td>
<td>100%</td>
<td>Image width of R plotting</td>
</tr>
<tr>
<td>zeppelin.R.shiny.iframe_width</td>
<td>100%</td>
<td>IFrame width of Shiny App</td>
</tr>
<tr>
<td>zeppelin.R.shiny.iframe_height</td>
<td>500px</td>
<td>IFrame height of Shiny App</td>
</tr>
<tr>
<td>zeppelin.R.shiny.portRange</td>
<td>:</td>
<td>Shiny app would launch a web app at some port, this property is to specify the portRange via format 'start':'end', e.g. '5000:5001'. By default it is ':' which means any port.</td>
</tr>
<tr>
<td>zeppelin.R.maxResult</td>
<td>1000</td>
<td>Max number of dataframe rows to display when using z.show</td>
</tr>
</table>
## Play R in Zeppelin docker
For beginner, we would suggest you to play R in Zeppelin docker first. In the Zeppelin docker image, we have already installed R and lots of useful R libraries including IRKernel's prerequisites, so `%r.ir` is available.
Without any extra configuration, you can run most of tutorial notes under folder `R Tutorial` directly.
```
docker run -u $(id -u) -p 8080:8080 -p:6789:6789 --rm --name zeppelin apache/zeppelin:0.10.0
```
After running the above command, you can open `http://localhost:8080` to play R in Zeppelin.
The port `6789` exposed in the above command is for R shiny app. You need to make the following 2 interpreter properties to enable shiny app accessible as iframe in Zeppelin docker container.
* `zeppelin.R.shiny.portRange` to be `6789:6789`
* Set `ZEPPELIN_LOCAL_IP` to be `0.0.0.0`
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_shiny_app.gif" width="800px"/>
## Interpreter binding mode
The default [interpreter binding mode](../usage/interpreter/interpreter_binding_mode.html) is `globally shared`. That means all notes share the same R interpreter.
So we would recommend you to ues `isolated per note` which means each note has own R interpreter without affecting each other. But it may run out of your machine resource if too many R
interpreters are created. You can [run R in yarn mode](../interpreter/r.html#run-r-in-yarn-cluster) to avoid this problem.
## How to use R Interpreter
There are two different implementations of R interpreters: `%r.r` and `%r.ir`.
* Vanilla R Interpreter(`%r.r`) behaves like an ordinary REPL and use SparkR to communicate between R process and JVM process. It requires `knitr` to be installed.
* IRKernel R Interpreter(`%r.ir`) behaves like using IRKernel in Jupyter notebook. It is based on [jupyter interpreter](jupyter.html). Besides jupyter interpreter's prerequisites, [IRkernel](https://github.com/IRkernel/IRkernel) needs to be installed as well.
Take a look at the tutorial note `R Tutorial/1. R Basics` for how to write R code in Zeppelin.
### R basic expressions
R basic expressions are supported in both `%r.r` and `%r.ir`.
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_basic.png" width="800px"/>
### R base plotting
R base plotting is supported in both `%r.r` and `%r.ir`.
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_plotting.png" width="800px"/>
### Other plotting
Besides R base plotting, you can use other visualization libraries in both `%r.r` and `%r.ir`, e.g. `ggplot` and `googleVis`
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_ggplot.png" width="800px"/>
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_googlevis.png" width="800px"/>
### z.show
`z.show()` is only available in `%r.ir` to visualize R dataframe, e.g.
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_zshow.png" width="800px"/>
By default, `z.show` would only display 1000 rows, you can specify the maxRows via `z.show(df, maxRows=2000)`
## Make Shiny App in Zeppelin
[Shiny](https://shiny.rstudio.com/tutorial/) is an R package that makes it easy to build interactive web applications (apps) straight from R.
`%r.shiny` is used for developing R shiny app in Zeppelin notebook. It only works when IRKernel Interpreter(`%r.ir`) is enabled.
For developing one Shiny App in Zeppelin, you need to write at least 3 paragraphs (server type paragraph, ui type paragraph and run type paragraph)
* Server type R shiny paragraph
```r
%r.shiny(type=server)
# Define server logic to summarize and view selected dataset ----
server <- function(input, output) {
# Return the requested dataset ----
datasetInput <- reactive({
switch(input$dataset,
"rock" = rock,
"pressure" = pressure,
"cars" = cars)
})
# Generate a summary of the dataset ----
output$summary <- renderPrint({
dataset <- datasetInput()
summary(dataset)
})
# Show the first "n" observations ----
output$view <- renderTable({
head(datasetInput(), n = input$obs)
})
}
```
* UI type R shiny paragraph
```r
%r.shiny(type=ui)
# Define UI for dataset viewer app ----
ui <- fluidPage(
# App title ----
titlePanel("Shiny Text"),
# Sidebar layout with a input and output definitions ----
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
# Input: Selector for choosing dataset ----
selectInput(inputId = "dataset",
label = "Choose a dataset:",
choices = c("rock", "pressure", "cars")),
# Input: Numeric entry for number of obs to view ----
numericInput(inputId = "obs",
label = "Number of observations to view:",
value = 10)
),
# Main panel for displaying outputs ----
mainPanel(
# Output: Verbatim text for data summary ----
verbatimTextOutput("summary"),
# Output: HTML table with requested number of observations ----
tableOutput("view")
)
)
)
```
* Run type R shiny paragraph
```r
%r.shiny(type=run)
```
After executing the run type R shiny paragraph, the shiny app will be launched and embedded as iframe in paragraph.
Take a look at the tutorial note `R Tutorial/2. Shiny App` for how to develop R shiny app.
<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_shiny.png" width="800px"/>
### Run multiple shiny apps
If you want to run multiple shiny apps, you can specify `app` in paragraph local property to differentiate different shiny apps.
e.g.
```r
%r.shiny(type=ui, app=app_1)
```
```r
%r.shiny(type=server, app=app_1)
```
```r
%r.shiny(type=run, app=app_1)
```
## Run R in yarn cluster
Zeppelin support to [run interpreter in yarn cluster](../quickstart/yarn.html). But there's one critical problem to run R in yarn cluster: how to manage the R environment in yarn container.
Because yarn cluster is a distributed cluster which is composed of many nodes, and your R interpreter can start in any node.
It is not practical to manage R environment in each node.
So in order to run R in yarn cluster, we would suggest you to use conda to manage your R environment, and Zeppelin can ship your
R conda environment to yarn container, so that each R interpreter can have its own R environment without affecting each other.
To be noticed, you can only run IRKernel interpreter(`%r.ir`) in yarn cluster. So make sure you include at least the following prerequisites in the below conda env:
* python
* jupyter
* grpcio
* protobuf
* r-base
* r-essentials
* r-irkernel
`python`, `jupyter`, `grpcio` and `protobuf` are required for [jupyter interpreter](../interpreter/jupyter.html), because IRKernel interpreter is based on [jupyter interpreter](../interpreter/jupyter.html). Others are for R runtime.
Following are instructions of how to run R in yarn cluster. You can find all the code in the tutorial note `R Tutorial/3. R Conda Env in Yarn Mode`.
### Step 1
We would suggest you to use conda pack to create archive of conda environment.
Here's one example of yaml file which is used to generate a conda environment with R and some useful R libraries.
* Create a yaml file for conda environment, write the following content into file `r_env.yml`
```text
name: r_env
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- jupyter
- grpcio
- protobuf
- r-base=3
- r-essentials
- r-evaluate
- r-base64enc
- r-knitr
- r-ggplot2
- r-irkernel
- r-shiny
- r-googlevis
```
* Create conda environment via this yaml file using either `conda` or `mamba`
```bash
conda env create -f r_env.yml
```
```bash
mamba env create -f r_env.yml
```
* Pack the conda environment using `conda`
```bash
conda pack -n r_env
```
### Step 2
Specify the following properties to enable yarn mode for R interpreter via [inline configuration](../usage/interpreter/overview.html#inline-generic-configuration)
```
%r.conf
zeppelin.interpreter.launcher yarn
zeppelin.yarn.dist.archives hdfs:///tmp/r_env.tar.gz#environment
zeppelin.interpreter.conda.env.name environment
```
`zeppelin.yarn.dist.archives` is the R conda environment tar file which is created in step 1. This tar will be shipped to yarn container and untar in the working directory of yarn container.
`hdfs:///tmp/r_env.tar.gz` is the R conda archive file you created in step 2. `environment` in `hdfs:///tmp/r_env.tar.gz#environment` is the folder name after untar.
This folder name should be the same as `zeppelin.interpreter.conda.env.name`.
### Step 3
Now you can use run R interpreter in yarn container and also use any R libraries you specify in step 1.