{% include JB/setup %}
R is a free software environment for statistical computing and graphics.
To run R code and visualize plots in Apache Zeppelin, you will need R on your zeppelin server node (or your dev laptop).
yum install R R-devel libcurl-devel openssl-devel
apt-get install r-base
Validate your installation with a simple R command:
R -e "print(1+1)"
To enjoy plots, install additional libraries with:
devtools with
R -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
knitr with
R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org')"
ggplot2 with
R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
Other visualization libraries:
R -e "install.packages(c('devtools','mplot', 'googleVis'), repos = 'http://cran.us.r-project.org'); require(devtools); install_github('ramnathv/rCharts')"
We recommend you to also install the following optional R libraries for happy data analytics:
Zeppelin supports R language in 3 interpreters
If you want to use R with Spark, it is almost the same via %spark.r
, %spark.ir
& %spark.shiny
. You can refer Spark interpreter docs for more details.
For beginner, we would suggest you to play R in Zeppelin docker first. In the Zeppelin docker image, we have already installed R and lots of useful R libraries including IRKernel's prerequisites, so %r.ir
is available.
Without any extra configuration, you can run most of tutorial notes under folder R Tutorial
directly.
docker run -u $(id -u) -p 8080:8080 -p:6789:6789 --rm --name zeppelin apache/zeppelin:0.10.0
After running the above command, you can open http://localhost:8080
to play R in Zeppelin. The port 6789
exposed in the above command is for R shiny app. You need to make the following 2 interpreter properties to enable shiny app accessible as iframe in Zeppelin docker container.
zeppelin.R.shiny.portRange
to be 6789:6789
ZEPPELIN_LOCAL_IP
to be 0.0.0.0
The default interpreter binding mode is globally shared
. That means all notes share the same R interpreter. So we would recommend you to ues isolated per note
which means each note has own R interpreter without affecting each other. But it may run out of your machine resource if too many R interpreters are created. You can run R in yarn mode to avoid this problem.
There are two different implementations of R interpreters: %r.r
and %r.ir
.
%r.r
) behaves like an ordinary REPL and use SparkR to communicate between R process and JVM process. It requires knitr
to be installed.%r.ir
) behaves like using IRKernel in Jupyter notebook. It is based on jupyter interpreter. Besides jupyter interpreter's prerequisites, IRkernel needs to be installed as well.Take a look at the tutorial note R Tutorial/1. R Basics
for how to write R code in Zeppelin.
R basic expressions are supported in both %r.r
and %r.ir
.
R base plotting is supported in both %r.r
and %r.ir
.
Besides R base plotting, you can use other visualization libraries in both %r.r
and %r.ir
, e.g. ggplot
and googleVis
z.show()
is only available in %r.ir
to visualize R dataframe, e.g.
By default, z.show
would only display 1000 rows, you can specify the maxRows via z.show(df, maxRows=2000)
Shiny is an R package that makes it easy to build interactive web applications (apps) straight from R. %r.shiny
is used for developing R shiny app in Zeppelin notebook. It only works when IRKernel Interpreter(%r.ir
) is enabled. For developing one Shiny App in Zeppelin, you need to write at least 3 paragraphs (server type paragraph, ui type paragraph and run type paragraph)
%r.shiny(type=server) # Define server logic to summarize and view selected dataset ---- server <- function(input, output) { # Return the requested dataset ---- datasetInput <- reactive({ switch(input$dataset, "rock" = rock, "pressure" = pressure, "cars" = cars) }) # Generate a summary of the dataset ---- output$summary <- renderPrint({ dataset <- datasetInput() summary(dataset) }) # Show the first "n" observations ---- output$view <- renderTable({ head(datasetInput(), n = input$obs) }) }
%r.shiny(type=ui) # Define UI for dataset viewer app ---- ui <- fluidPage( # App title ---- titlePanel("Shiny Text"), # Sidebar layout with a input and output definitions ---- sidebarLayout( # Sidebar panel for inputs ---- sidebarPanel( # Input: Selector for choosing dataset ---- selectInput(inputId = "dataset", label = "Choose a dataset:", choices = c("rock", "pressure", "cars")), # Input: Numeric entry for number of obs to view ---- numericInput(inputId = "obs", label = "Number of observations to view:", value = 10) ), # Main panel for displaying outputs ---- mainPanel( # Output: Verbatim text for data summary ---- verbatimTextOutput("summary"), # Output: HTML table with requested number of observations ---- tableOutput("view") ) ) )
%r.shiny(type=run)
After executing the run type R shiny paragraph, the shiny app will be launched and embedded as iframe in paragraph. Take a look at the tutorial note R Tutorial/2. Shiny App
for how to develop R shiny app.
If you want to run multiple shiny apps, you can specify app
in paragraph local property to differentiate different shiny apps.
e.g.
%r.shiny(type=ui, app=app_1)
%r.shiny(type=server, app=app_1)
%r.shiny(type=run, app=app_1)
Zeppelin support to run interpreter in yarn cluster. But there's one critical problem to run R in yarn cluster: how to manage the R environment in yarn container. Because yarn cluster is a distributed cluster which is composed of many nodes, and your R interpreter can start in any node. It is not practical to manage R environment in each node.
So in order to run R in yarn cluster, we would suggest you to use conda to manage your R environment, and Zeppelin can ship your R conda environment to yarn container, so that each R interpreter can have its own R environment without affecting each other.
To be noticed, you can only run IRKernel interpreter(%r.ir
) in yarn cluster. So make sure you include at least the following prerequisites in the below conda env:
python
, jupyter
, grpcio
and protobuf
are required for jupyter interpreter, because IRKernel interpreter is based on jupyter interpreter. Others are for R runtime.
Following are instructions of how to run R in yarn cluster. You can find all the code in the tutorial note R Tutorial/3. R Conda Env in Yarn Mode
.
We would suggest you to use conda pack to create archive of conda environment.
Here's one example of yaml file which is used to generate a conda environment with R and some useful R libraries.
r_env.yml
name: r_env channels: - conda-forge - defaults dependencies: - python=3.7 - jupyter - grpcio - protobuf - r-base=3 - r-essentials - r-evaluate - r-base64enc - r-knitr - r-ggplot2 - r-irkernel - r-shiny - r-googlevis
conda
or mamba
conda env create -f r_env.yml
mamba env create -f r_env.yml
conda
conda pack -n r_env
Specify the following properties to enable yarn mode for R interpreter via inline configuration
%r.conf zeppelin.interpreter.launcher yarn zeppelin.yarn.dist.archives hdfs:///tmp/r_env.tar.gz#environment zeppelin.interpreter.conda.env.name environment
zeppelin.yarn.dist.archives
is the R conda environment tar file which is created in step 1. This tar will be shipped to yarn container and untar in the working directory of yarn container. hdfs:///tmp/r_env.tar.gz
is the R conda archive file you created in step 2. environment
in hdfs:///tmp/r_env.tar.gz#environment
is the folder name after untar. This folder name should be the same as zeppelin.interpreter.conda.env.name
.
Now you can use run R interpreter in yarn container and also use any R libraries you specify in step 1.