blob: f275e342b1e095149bfc5c3b7dc5f928d7ae7159 [file] [log] [blame]
<!DOCTYPE html><html><head><title>R: Download and Install Apache Spark to a Local Directory</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
<script type="text/javascript">
const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
function processMathHTML() {
var l = document.getElementsByClassName('reqn');
for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
return;
}</script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
onload="processMathHTML();"></script>
<link rel="stylesheet" type="text/css" href="R.css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head><body><div class="container">
<table style="width: 100%;"><tr><td>install.spark {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
<h2>Download and Install Apache Spark to a Local Directory</h2>
<h3>Description</h3>
<p><code>install.spark</code> downloads and installs Spark to a local directory if
it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
Hadoop version, the remote mirror site, and the directory where the package is installed locally.
</p>
<h3>Usage</h3>
<pre><code class='language-R'>install.spark(
hadoopVersion = "2.7",
mirrorUrl = NULL,
localDir = NULL,
overwrite = FALSE
)
</code></pre>
<h3>Arguments</h3>
<table>
<tr style="vertical-align: top;"><td><code>hadoopVersion</code></td>
<td>
<p>Version of Hadoop to install. Default is <code>"2.7"</code>. It can take other
version number in the format of &quot;x.y&quot; where x and y are integer.
If <code>hadoopVersion = "without"</code>, &quot;Hadoop free&quot; build is installed.
See
<a href="https://spark.apache.org/docs/latest/hadoop-provided.html">
&quot;Hadoop Free&quot; Build</a> for more information.
Other patched version names can also be used, e.g. <code>"cdh4"</code></p>
</td></tr>
<tr style="vertical-align: top;"><td><code>mirrorUrl</code></td>
<td>
<p>base URL of the repositories to use. The directory layout should follow
<a href="https://www.apache.org/dyn/closer.lua/spark/">Apache mirrors</a>.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>localDir</code></td>
<td>
<p>a local directory where Spark is installed. The directory contains
version-specific folders of Spark packages. Default is path to
the cache directory:
</p>
<ul>
<li><p> Mac OS X: &lsquo;<span class="file">~/Library/Caches/spark</span>&rsquo;
</p>
</li>
<li><p> Unix: <span class="env">$XDG_CACHE_HOME</span> if defined, otherwise &lsquo;<span class="file">~/.cache/spark</span>&rsquo;
</p>
</li>
<li><p> Windows: &lsquo;<span class="file">%LOCALAPPDATA%\Apache\Spark\Cache</span>&rsquo;.
</p>
</li></ul>
</td></tr>
<tr style="vertical-align: top;"><td><code>overwrite</code></td>
<td>
<p>If <code>TRUE</code>, download and overwrite the existing tar file in localDir
and force re-install Spark (in case the local directory or file is corrupted)</p>
</td></tr>
</table>
<h3>Details</h3>
<p>The full url of remote file is inferred from <code>mirrorUrl</code> and <code>hadoopVersion</code>.
<code>mirrorUrl</code> specifies the remote path to a Spark folder. It is followed by a subfolder
named after the Spark version (that corresponds to SparkR), and then the tar filename.
The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
<code>http://apache.osuosl.org</code> has path:
<code>http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz</code>.
For <code>hadoopVersion = "without"</code>, [Hadoop version] in the filename is then
<code>without-hadoop</code>.
</p>
<h3>Value</h3>
<p>the (invisible) local directory where Spark is found or installed
</p>
<h3>Note</h3>
<p>install.spark since 2.1.0
</p>
<h3>See Also</h3>
<p>See available Hadoop versions:
<a href="https://spark.apache.org/downloads.html">Apache Spark</a>
</p>
<h3>Examples</h3>
<pre><code class="r">## Not run:
##D install.spark()
## End(Not run)
</code></pre>
<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
</div>
</body></html>