site/docs/3.2.2/api/R/spark.lm.html - spark-website - Git at Google

 <!DOCTYPE html><html><head><title>R: Linear Regression Model</title>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
 <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
 <script type="text/javascript">
 const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
 function processMathHTML() {
     var l = document.getElementsByClassName('reqn');
     for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
     return;
 }</script>
 <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
     onload="processMathHTML();"></script>
 <link rel="stylesheet" type="text/css" href="R.css" />

 <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
 </head><body><div class="container">

 <table style="width: 100%;"><tr><td>spark.lm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>

 <h2>Linear Regression Model</h2>

 <h3>Description</h3>

 <p><code>spark.lm</code> fits a linear regression model against a SparkDataFrame.
 Users can call <code>summary</code> to print a summary of the fitted model,
 <code>predict</code> to make predictions on new data,
 and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
 </p>


 <h3>Usage</h3>

 <pre><code class='language-R'>spark.lm(data, formula, ...)

 ## S4 method for signature 'SparkDataFrame,formula'
 spark.lm(
   data,
   formula,
   maxIter = 100L,
   regParam = 0,
   elasticNetParam = 0,
   tol = 1e-06,
   standardization = TRUE,
   solver = c("auto", "l-bfgs", "normal"),
   weightCol = NULL,
   aggregationDepth = 2L,
   loss = c("squaredError", "huber"),
   epsilon = 1.35,
   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc",
     "alphabetAsc")
 )

 ## S4 method for signature 'LinearRegressionModel'
 summary(object)

 ## S4 method for signature 'LinearRegressionModel'
 predict(object, newData)

 ## S4 method for signature 'LinearRegressionModel,character'
 write.ml(object, path, overwrite = FALSE)
 </code></pre>


 <h3>Arguments</h3>

 <table>
 <tr style="vertical-align: top;"><td><code>data</code></td>
 <td>
 <p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>formula</code></td>
 <td>
 <p>a symbolic description of the model to be fitted. Currently only a few formula
 operators are supported, including '~', '.', ':', '+', and '-'.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>...</code></td>
 <td>
 <p>additional arguments passed to the method.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>maxIter</code></td>
 <td>
 <p>maximum iteration number.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>regParam</code></td>
 <td>
 <p>the regularization parameter.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>elasticNetParam</code></td>
 <td>
 <p>the ElasticNet mixing parameter, in range [0, 1].
 For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>tol</code></td>
 <td>
 <p>convergence tolerance of iterations.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>standardization</code></td>
 <td>
 <p>whether to standardize the training features before fitting the model.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>solver</code></td>
 <td>
 <p>The solver algorithm for optimization.
 Supported options: &quot;l-bfgs&quot;, &quot;normal&quot; and &quot;auto&quot;.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>weightCol</code></td>
 <td>
 <p>weight column name.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>aggregationDepth</code></td>
 <td>
 <p>suggested depth for treeAggregate (&gt;= 2).</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>loss</code></td>
 <td>
 <p>the loss function to be optimized. Supported options: &quot;squaredError&quot; and &quot;huber&quot;.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>epsilon</code></td>
 <td>
 <p>the shape parameter to control the amount of robustness.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>stringIndexerOrderType</code></td>
 <td>
 <p>how to order categories of a string feature column. This is used to
 decide the base level of a string feature as the last category
 after ordering is dropped when encoding strings. Supported options
 are &quot;frequencyDesc&quot;, &quot;frequencyAsc&quot;, &quot;alphabetDesc&quot;, and
 &quot;alphabetAsc&quot;. The default value is &quot;frequencyDesc&quot;. When the
 ordering is set to &quot;alphabetDesc&quot;, this drops the same category
 as R when encoding strings.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>object</code></td>
 <td>
 <p>a Linear Regression Model model fitted by <code>spark.lm</code>.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>newData</code></td>
 <td>
 <p>a SparkDataFrame for testing.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>path</code></td>
 <td>
 <p>The directory where the model is saved.</p>
 </td></tr>
 <tr style="vertical-align: top;"><td><code>overwrite</code></td>
 <td>
 <p>Overwrites or not if the output path already exists. Default is FALSE
 which means throw exception if the output path exists.</p>
 </td></tr>
 </table>


 <h3>Value</h3>

 <p><code>spark.lm</code> returns a fitted Linear Regression Model.
 </p>
 <p><code>summary</code> returns summary information of the fitted model, which is a list.
 </p>
 <p><code>predict</code> returns the predicted values based on a LinearRegressionModel.
 </p>


 <h3>Note</h3>

 <p>spark.lm since 3.1.0
 </p>
 <p>summary(LinearRegressionModel) since 3.1.0
 </p>
 <p>predict(LinearRegressionModel) since 3.1.0
 </p>
 <p>write.ml(LinearRegressionModel, character) since 3.1.0
 </p>


 <h3>See Also</h3>

 <p><a href="../../SparkR/help/read.ml.html">read.ml</a>
 </p>


 <h3>Examples</h3>

 <pre><code class="r">## Not run:
 ##D df &lt;- read.df(&quot;data/mllib/sample_linear_regression_data.txt&quot;, source = &quot;libsvm&quot;)
 ##D
 ##D # fit Linear Regression Model
 ##D model &lt;- spark.lm(df, label ~ features, regParam = 0.01, maxIter = 1)
 ##D
 ##D # get the summary of the model
 ##D summary(model)
 ##D
 ##D # make predictions
 ##D predictions &lt;- predict(model, df)
 ##D
 ##D # save and load the model
 ##D path &lt;- &quot;path/to/model&quot;
 ##D write.ml(model, path)
 ##D savedModel &lt;- read.ml(path)
 ##D summary(savedModel)
 ## End(Not run)
 </code></pre>


 <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
 </div>
 </body></html>
	<!DOCTYPE html><html><head><title>R: Linear Regression Model</title>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
	<script type="text/javascript">
	const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
	function processMathHTML() {
	var l = document.getElementsByClassName('reqn');
	for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
	return;
	}</script>
	<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
	onload="processMathHTML();"></script>
	<link rel="stylesheet" type="text/css" href="R.css" />

	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
	<script>hljs.initHighlightingOnLoad();</script>
	</head><body><div class="container">

	<table style="width: 100%;"><tr><td>spark.lm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>

	<h2>Linear Regression Model</h2>

	<h3>Description</h3>

	<p><code>spark.lm</code> fits a linear regression model against a SparkDataFrame.
	Users can call <code>summary</code> to print a summary of the fitted model,
	<code>predict</code> to make predictions on new data,
	and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
	</p>


	<h3>Usage</h3>

	<pre><code class='language-R'>spark.lm(data, formula, ...)

	## S4 method for signature 'SparkDataFrame,formula'
	spark.lm(
	data,
	formula,
	maxIter = 100L,
	regParam = 0,
	elasticNetParam = 0,
	tol = 1e-06,
	standardization = TRUE,
	solver = c("auto", "l-bfgs", "normal"),
	weightCol = NULL,
	aggregationDepth = 2L,
	loss = c("squaredError", "huber"),
	epsilon = 1.35,
	stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc",
	"alphabetAsc")
	)

	## S4 method for signature 'LinearRegressionModel'
	summary(object)

	## S4 method for signature 'LinearRegressionModel'
	predict(object, newData)

	## S4 method for signature 'LinearRegressionModel,character'
	write.ml(object, path, overwrite = FALSE)
	</code></pre>


	<h3>Arguments</h3>

	<table>
	<tr style="vertical-align: top;"><td><code>data</code></td>
	<td>
	<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>formula</code></td>
	<td>
	<p>a symbolic description of the model to be fitted. Currently only a few formula
	operators are supported, including '~', '.', ':', '+', and '-'.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>...</code></td>
	<td>
	<p>additional arguments passed to the method.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>maxIter</code></td>
	<td>
	<p>maximum iteration number.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>regParam</code></td>
	<td>
	<p>the regularization parameter.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>elasticNetParam</code></td>
	<td>
	<p>the ElasticNet mixing parameter, in range [0, 1].
	For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>tol</code></td>
	<td>
	<p>convergence tolerance of iterations.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>standardization</code></td>
	<td>
	<p>whether to standardize the training features before fitting the model.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>solver</code></td>
	<td>
	<p>The solver algorithm for optimization.
	Supported options: "l-bfgs", "normal" and "auto".</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>weightCol</code></td>
	<td>
	<p>weight column name.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>aggregationDepth</code></td>
	<td>
	<p>suggested depth for treeAggregate (>= 2).</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>loss</code></td>
	<td>
	<p>the loss function to be optimized. Supported options: "squaredError" and "huber".</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>epsilon</code></td>
	<td>
	<p>the shape parameter to control the amount of robustness.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>stringIndexerOrderType</code></td>
	<td>
	<p>how to order categories of a string feature column. This is used to
	decide the base level of a string feature as the last category
	after ordering is dropped when encoding strings. Supported options
	are "frequencyDesc", "frequencyAsc", "alphabetDesc", and
	"alphabetAsc". The default value is "frequencyDesc". When the
	ordering is set to "alphabetDesc", this drops the same category
	as R when encoding strings.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>object</code></td>
	<td>
	<p>a Linear Regression Model model fitted by <code>spark.lm</code>.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>newData</code></td>
	<td>
	<p>a SparkDataFrame for testing.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>path</code></td>
	<td>
	<p>The directory where the model is saved.</p>
	</td></tr>
	<tr style="vertical-align: top;"><td><code>overwrite</code></td>
	<td>
	<p>Overwrites or not if the output path already exists. Default is FALSE
	which means throw exception if the output path exists.</p>
	</td></tr>
	</table>


	<h3>Value</h3>

	<p><code>spark.lm</code> returns a fitted Linear Regression Model.
	</p>
	<p><code>summary</code> returns summary information of the fitted model, which is a list.
	</p>
	<p><code>predict</code> returns the predicted values based on a LinearRegressionModel.
	</p>


	<h3>Note</h3>

	<p>spark.lm since 3.1.0
	</p>
	<p>summary(LinearRegressionModel) since 3.1.0
	</p>
	<p>predict(LinearRegressionModel) since 3.1.0
	</p>
	<p>write.ml(LinearRegressionModel, character) since 3.1.0
	</p>


	<h3>See Also</h3>

	<p><a href="../../SparkR/help/read.ml.html">read.ml</a>
	</p>


	<h3>Examples</h3>

	<pre><code class="r">## Not run:
	##D df <- read.df("data/mllib/sample_linear_regression_data.txt", source = "libsvm")
	##D
	##D # fit Linear Regression Model
	##D model <- spark.lm(df, label ~ features, regParam = 0.01, maxIter = 1)
	##D
	##D # get the summary of the model
	##D summary(model)
	##D
	##D # make predictions
	##D predictions <- predict(model, df)
	##D
	##D # save and load the model
	##D path <- "path/to/model"
	##D write.ml(model, path)
	##D savedModel <- read.ml(path)
	##D summary(savedModel)
	## End(Not run)
	</code></pre>


	<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
	</div>
	</body></html>