blob: 7f8fb29e7b95203c09dbcec0bfff359e5371a535 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Download and Install Apache Spark to a Local Directory</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" type="text/css" href="R.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head><body>
<table width="100%" summary="page for install.spark {SparkR}"><tr><td>install.spark {SparkR}</td><td align="right">R Documentation</td></tr></table>
<h2>Download and Install Apache Spark to a Local Directory</h2>
<h3>Description</h3>
<p><code>install.spark</code> downloads and installs Spark to a local directory if
it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
Hadoop version, the remote mirror site, and the directory where the package is installed locally.
</p>
<h3>Usage</h3>
<pre>
install.spark(hadoopVersion = "2.7", mirrorUrl = NULL, localDir = NULL,
overwrite = FALSE)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>hadoopVersion</code></td>
<td>
<p>Version of Hadoop to install. Default is <code>"2.7"</code>. It can take other
version number in the format of &quot;x.y&quot; where x and y are integer.
If <code>hadoopVersion = "without"</code>, &quot;Hadoop free&quot; build is installed.
See
<a href="http://spark.apache.org/docs/latest/hadoop-provided.html">
&quot;Hadoop Free&quot; Build</a> for more information.
Other patched version names can also be used, e.g. <code>"cdh4"</code></p>
</td></tr>
<tr valign="top"><td><code>mirrorUrl</code></td>
<td>
<p>base URL of the repositories to use. The directory layout should follow
<a href="http://www.apache.org/dyn/closer.lua/spark/">Apache mirrors</a>.</p>
</td></tr>
<tr valign="top"><td><code>localDir</code></td>
<td>
<p>a local directory where Spark is installed. The directory contains
version-specific folders of Spark packages. Default is path to
the cache directory:
</p>
<ul>
<li><p> Mac OS X: &lsquo;<span class="file">~/Library/Caches/spark</span>&rsquo;
</p>
</li>
<li><p> Unix: <span class="env">$XDG_CACHE_HOME</span> if defined, otherwise &lsquo;<span class="file">~/.cache/spark</span>&rsquo;
</p>
</li>
<li><p> Windows: &lsquo;<span class="file">%LOCALAPPDATA%\Apache\Spark\Cache</span>&rsquo;.
</p>
</li></ul>
</td></tr>
<tr valign="top"><td><code>overwrite</code></td>
<td>
<p>If <code>TRUE</code>, download and overwrite the existing tar file in localDir
and force re-install Spark (in case the local directory or file is corrupted)</p>
</td></tr>
</table>
<h3>Details</h3>
<p>The full url of remote file is inferred from <code>mirrorUrl</code> and <code>hadoopVersion</code>.
<code>mirrorUrl</code> specifies the remote path to a Spark folder. It is followed by a subfolder
named after the Spark version (that corresponds to SparkR), and then the tar filename.
The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
<code>http://apache.osuosl.org</code> has path:
<code>http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz</code>.
For <code>hadoopVersion = "without"</code>, [Hadoop version] in the filename is then
<code>without-hadoop</code>.
</p>
<h3>Value</h3>
<p>the (invisible) local directory where Spark is found or installed
</p>
<h3>Note</h3>
<p>install.spark since 2.1.0
</p>
<h3>See Also</h3>
<p>See available Hadoop versions:
<a href="http://spark.apache.org/downloads.html">Apache Spark</a>
</p>
<h3>Examples</h3>
<pre><code class="r">## Not run:
##D install.spark()
## End(Not run)
</code></pre>
<hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
</body></html>