| <!DOCTYPE html><html><head><title>R: Download and Install Apache Spark to a Local Directory</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css"> |
| <script type="text/javascript"> |
| const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"}; |
| function processMathHTML() { |
| var l = document.getElementsByClassName('reqn'); |
| for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } |
| return; |
| }</script> |
| <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js" |
| onload="processMathHTML();"></script> |
| <link rel="stylesheet" type="text/css" href="R.css" /> |
| |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> |
| <script>hljs.initHighlightingOnLoad();</script> |
| </head><body><div class="container"> |
| |
| <table style="width: 100%;"><tr><td>install.spark {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> |
| |
| <h2>Download and Install Apache Spark to a Local Directory</h2> |
| |
| <h3>Description</h3> |
| |
| <p><code>install.spark</code> downloads and installs Spark to a local directory if |
| it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is |
| returned. The Spark version we use is the same as the SparkR version. Users can specify a desired |
| Hadoop version, the remote mirror site, and the directory where the package is installed locally. |
| </p> |
| |
| |
| <h3>Usage</h3> |
| |
| <pre><code class='language-R'>install.spark( |
| hadoopVersion = "2.7", |
| mirrorUrl = NULL, |
| localDir = NULL, |
| overwrite = FALSE |
| ) |
| </code></pre> |
| |
| |
| <h3>Arguments</h3> |
| |
| <table> |
| <tr style="vertical-align: top;"><td><code>hadoopVersion</code></td> |
| <td> |
| <p>Version of Hadoop to install. Default is <code>"2.7"</code>. It can take other |
| version number in the format of "x.y" where x and y are integer. |
| If <code>hadoopVersion = "without"</code>, "Hadoop free" build is installed. |
| See |
| <a href="https://spark.apache.org/docs/latest/hadoop-provided.html"> |
| "Hadoop Free" Build</a> for more information. |
| Other patched version names can also be used, e.g. <code>"cdh4"</code></p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>mirrorUrl</code></td> |
| <td> |
| <p>base URL of the repositories to use. The directory layout should follow |
| <a href="https://www.apache.org/dyn/closer.lua/spark/">Apache mirrors</a>.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>localDir</code></td> |
| <td> |
| <p>a local directory where Spark is installed. The directory contains |
| version-specific folders of Spark packages. Default is path to |
| the cache directory: |
| </p> |
| |
| <ul> |
| <li><p> Mac OS X: ‘<span class="file">~/Library/Caches/spark</span>’ |
| </p> |
| </li> |
| <li><p> Unix: <span class="env">$XDG_CACHE_HOME</span> if defined, otherwise ‘<span class="file">~/.cache/spark</span>’ |
| </p> |
| </li> |
| <li><p> Windows: ‘<span class="file">%LOCALAPPDATA%\Apache\Spark\Cache</span>’. |
| </p> |
| </li></ul> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>overwrite</code></td> |
| <td> |
| <p>If <code>TRUE</code>, download and overwrite the existing tar file in localDir |
| and force re-install Spark (in case the local directory or file is corrupted)</p> |
| </td></tr> |
| </table> |
| |
| |
| <h3>Details</h3> |
| |
| <p>The full url of remote file is inferred from <code>mirrorUrl</code> and <code>hadoopVersion</code>. |
| <code>mirrorUrl</code> specifies the remote path to a Spark folder. It is followed by a subfolder |
| named after the Spark version (that corresponds to SparkR), and then the tar filename. |
| The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. |
| For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from |
| <code>http://apache.osuosl.org</code> has path: |
| <code>http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz</code>. |
| For <code>hadoopVersion = "without"</code>, [Hadoop version] in the filename is then |
| <code>without-hadoop</code>. |
| </p> |
| |
| |
| <h3>Value</h3> |
| |
| <p>the (invisible) local directory where Spark is found or installed |
| </p> |
| |
| |
| <h3>Note</h3> |
| |
| <p>install.spark since 2.1.0 |
| </p> |
| |
| |
| <h3>See Also</h3> |
| |
| <p>See available Hadoop versions: |
| <a href="https://spark.apache.org/downloads.html">Apache Spark</a> |
| </p> |
| |
| |
| <h3>Examples</h3> |
| |
| <pre><code class="r">## Not run: |
| ##D install.spark() |
| ## End(Not run) |
| </code></pre> |
| |
| |
| <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div> |
| </div> |
| </body></html> |