Presently, the following APIs are made private in SparkR that are used by the kernel to provide a form of communicate suitable for use as an interpreter:
SparkR only has an init()
method that connects to the backend service for R and creates a SparkContext instance. That I am aware, there is no other way to currently use SparkR. Because of this, a new method labelled sparkR.connect()
is used that retrieves the existing port under the environment variable EXISTING_SPARKR_BACKEND_PORT. This method is located in sparkR.R
and is exported via the following:
export("sparkR.connect")
SparkR low-level methods to communicate with the backend were marked private, but are used to communicate with our own bridge. These are now exported in the NAMESPACE file via the following:
export("isInstanceOf") export("callJMethod") export("callJStatic") export("newJObject") export("removeJObject") export("isRemoveMethod") export("invokeJava")
org.apache.spark.api.r.RBackend
is marked as limited access to the package scope of org.apache.spark.api.r
- To circumvent, use a reflective wrapping under `org.apache.toree.kernel.interpreter.r.ReflectiveRBackend`
To build the SparkR fork (until these changes appear upstream), you need to run ./package-sparkR.sh
. This will generate the output library and tar it up for use by the kernel.