site/docs/3.2.2/api/R/install.spark.html - spark-website - Git at Google

 <!DOCTYPE html><html><head><title>R: Download and Install Apache Spark to a Local Directory</title>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
 <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
 <script type="text/javascript">
 const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
 function processMathHTML() {
     var l = document.getElementsByClassName('reqn');
     for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
     return;
 }</script>
 <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
     onload="processMathHTML();"></script>
 <link rel="stylesheet" type="text/css" href="R.css" />

 <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
 </head><body><div class="container">

 <table style="width: 100%;"><tr><td>install.spark {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>

 <h2>Download and Install Apache Spark to a Local Directory</h2>

 <h3>Description</h3>

 <p><code>install.spark</code> downloads and installs Spark to a local directory if
 it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
 returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
 Hadoop version, the remote mirror site, and the directory where the package is installed locally.
 </p>


 <h3>Usage</h3>

 <pre><code class='language-R'>install.spark(
   hadoopVersion = "2.7",
   mirrorUrl = NULL,
   localDir = NULL,
   overwrite = FALSE
 )
 </code></pre>


 <h3>Arguments</h3>

 <table>
 <tr style="vertical-align: top;"><td><code>hadoopVersion</code></td>
 <td>
 <p>Version of Hadoop to install. Default is <code>"2.7"</code>. It can take other
 version number in the format of &quot;x.y&quot; where x and y are integer.
 If <code>hadoopVersion = "without"</code>, &quot;Hadoop free&quot; build is installed.
 See
 <a href="https://spark.apache.org/docs/latest/hadoop-provided.html">
 &quot;Hadoop Free&quot; Build</a> for more information.
 Other patched version names can also be used, e.g. <code>"cdh4"</code></p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>mirrorUrl</code></td>
 <td>
 <p>base URL of the repositories to use. The directory layout should follow
 <a href="https://www.apache.org/dyn/closer.lua/spark/">Apache mirrors</a>.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>localDir</code></td>
 <td>
 <p>a local directory where Spark is installed. The directory contains
 version-specific folders of Spark packages. Default is path to
 the cache directory:
 </p>

 <ul>
 <li><p> Mac OS X: &lsquo;<span class="file">~/Library/Caches/spark</span>&rsquo;
 </p>
 </li>
 <li><p> Unix: <span class="env">$XDG_CACHE_HOME</span> if defined, otherwise &lsquo;<span class="file">~/.cache/spark</span>&rsquo;
 </p>
 </li>
 <li><p> Windows: &lsquo;<span class="file">%LOCALAPPDATA%\Apache\Spark\Cache</span>&rsquo;.
 </p>
 </li></ul>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>overwrite</code></td>
 <td>
 <p>If <code>TRUE</code>, download and overwrite the existing tar file in localDir
 and force re-install Spark (in case the local directory or file is corrupted)</p>
 </td></tr>
 </table>


 <h3>Details</h3>

 <p>The full url of remote file is inferred from <code>mirrorUrl</code> and <code>hadoopVersion</code>.
 <code>mirrorUrl</code> specifies the remote path to a Spark folder. It is followed by a subfolder
 named after the Spark version (that corresponds to SparkR), and then the tar filename.
 The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
 For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
 <code>http://apache.osuosl.org</code> has path:
 <code>http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz</code>.
 For <code>hadoopVersion = "without"</code>, [Hadoop version] in the filename is then
 <code>without-hadoop</code>.
 </p>


 <h3>Value</h3>

 <p>the (invisible) local directory where Spark is found or installed
 </p>


 <h3>Note</h3>

 <p>install.spark since 2.1.0
 </p>


 <h3>See Also</h3>

 <p>See available Hadoop versions:
 <a href="https://spark.apache.org/downloads.html">Apache Spark</a>
 </p>


 <h3>Examples</h3>

 <pre><code class="r">## Not run:
 ##D install.spark()
 ## End(Not run)
 </code></pre>


 <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
 </div>
 </body></html>
	<!DOCTYPE html><html><head><title>R: Download and Install Apache Spark to a Local Directory</title>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
	<script type="text/javascript">
	const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
	function processMathHTML() {
	var l = document.getElementsByClassName('reqn');
	for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
	return;
	}</script>
	<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
	onload="processMathHTML();"></script>
	<link rel="stylesheet" type="text/css" href="R.css" />

	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
	<script>hljs.initHighlightingOnLoad();</script>
	</head><body><div class="container">

	<table style="width: 100%;"><tr><td>install.spark {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>

	<h2>Download and Install Apache Spark to a Local Directory</h2>

	<h3>Description</h3>

	<p><code>install.spark</code> downloads and installs Spark to a local directory if
	it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
	returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
	Hadoop version, the remote mirror site, and the directory where the package is installed locally.
	</p>


	<h3>Usage</h3>

	<pre><code class='language-R'>install.spark(
	hadoopVersion = "2.7",
	mirrorUrl = NULL,
	localDir = NULL,
	overwrite = FALSE
	)
	</code></pre>


	<h3>Arguments</h3>

	<table>
	<tr style="vertical-align: top;"><td><code>hadoopVersion</code></td>
	<td>
	<p>Version of Hadoop to install. Default is <code>"2.7"</code>. It can take other
	version number in the format of "x.y" where x and y are integer.
	If <code>hadoopVersion = "without"</code>, "Hadoop free" build is installed.
	See
	<a href="https://spark.apache.org/docs/latest/hadoop-provided.html">
	"Hadoop Free" Build</a> for more information.
	Other patched version names can also be used, e.g. <code>"cdh4"</code></p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>mirrorUrl</code></td>
	<td>
	<p>base URL of the repositories to use. The directory layout should follow
	<a href="https://www.apache.org/dyn/closer.lua/spark/">Apache mirrors</a>.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>localDir</code></td>
	<td>
	<p>a local directory where Spark is installed. The directory contains
	version-specific folders of Spark packages. Default is path to
	the cache directory:
	</p>

	<ul>
	<li><p> Mac OS X: ‘<span class="file">~/Library/Caches/spark</span>’
	</p>
	</li>
	<li><p> Unix: <span class="env">$XDG_CACHE_HOME</span> if defined, otherwise ‘<span class="file">~/.cache/spark</span>’
	</p>
	</li>
	<li><p> Windows: ‘<span class="file">%LOCALAPPDATA%\Apache\Spark\Cache</span>’.
	</p>
	</li></ul>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>overwrite</code></td>
	<td>
	<p>If <code>TRUE</code>, download and overwrite the existing tar file in localDir
	and force re-install Spark (in case the local directory or file is corrupted)</p>
	</td></tr>
	</table>


	<h3>Details</h3>

	<p>The full url of remote file is inferred from <code>mirrorUrl</code> and <code>hadoopVersion</code>.
	<code>mirrorUrl</code> specifies the remote path to a Spark folder. It is followed by a subfolder
	named after the Spark version (that corresponds to SparkR), and then the tar filename.
	The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
	For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
	<code>http://apache.osuosl.org</code> has path:
	<code>http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz</code>.
	For <code>hadoopVersion = "without"</code>, [Hadoop version] in the filename is then
	<code>without-hadoop</code>.
	</p>


	<h3>Value</h3>

	<p>the (invisible) local directory where Spark is found or installed
	</p>


	<h3>Note</h3>

	<p>install.spark since 2.1.0
	</p>


	<h3>See Also</h3>

	<p>See available Hadoop versions:
	<a href="https://spark.apache.org/downloads.html">Apache Spark</a>
	</p>


	<h3>Examples</h3>

	<pre><code class="r">## Not run:
	##D install.spark()
	## End(Not run)
	</code></pre>


	<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
	</div>
	</body></html>