blob: 163b53c77037155195d6816dd14d4b3ccc68e269 [file] [log] [blame]
<!DOCTYPE html><html><head><title>R: Linear Regression Model</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css">
<script type="text/javascript">
const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
function processMathHTML() {
var l = document.getElementsByClassName('reqn');
for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); }
return;
}</script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"
onload="processMathHTML();"></script>
<link rel="stylesheet" type="text/css" href="R.css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head><body><div class="container">
<table style="width: 100%;"><tr><td>spark.lm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
<h2>Linear Regression Model</h2>
<h3>Description</h3>
<p><code>spark.lm</code> fits a linear regression model against a SparkDataFrame.
Users can call <code>summary</code> to print a summary of the fitted model,
<code>predict</code> to make predictions on new data,
and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
</p>
<h3>Usage</h3>
<pre><code class='language-R'>spark.lm(data, formula, ...)
## S4 method for signature 'SparkDataFrame,formula'
spark.lm(
data,
formula,
maxIter = 100L,
regParam = 0,
elasticNetParam = 0,
tol = 1e-06,
standardization = TRUE,
solver = c("auto", "l-bfgs", "normal"),
weightCol = NULL,
aggregationDepth = 2L,
loss = c("squaredError", "huber"),
epsilon = 1.35,
stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc",
"alphabetAsc")
)
## S4 method for signature 'LinearRegressionModel'
summary(object)
## S4 method for signature 'LinearRegressionModel'
predict(object, newData)
## S4 method for signature 'LinearRegressionModel,character'
write.ml(object, path, overwrite = FALSE)
</code></pre>
<h3>Arguments</h3>
<table>
<tr style="vertical-align: top;"><td><code>data</code></td>
<td>
<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>formula</code></td>
<td>
<p>a symbolic description of the model to be fitted. Currently only a few formula
operators are supported, including '~', '.', ':', '+', and '-'.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>...</code></td>
<td>
<p>additional arguments passed to the method.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>maxIter</code></td>
<td>
<p>maximum iteration number.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>regParam</code></td>
<td>
<p>the regularization parameter.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>elasticNetParam</code></td>
<td>
<p>the ElasticNet mixing parameter, in range [0, 1].
For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>tol</code></td>
<td>
<p>convergence tolerance of iterations.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>standardization</code></td>
<td>
<p>whether to standardize the training features before fitting the model.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>solver</code></td>
<td>
<p>The solver algorithm for optimization.
Supported options: &quot;l-bfgs&quot;, &quot;normal&quot; and &quot;auto&quot;.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>weightCol</code></td>
<td>
<p>weight column name.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>aggregationDepth</code></td>
<td>
<p>suggested depth for treeAggregate (&gt;= 2).</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>loss</code></td>
<td>
<p>the loss function to be optimized. Supported options: &quot;squaredError&quot; and &quot;huber&quot;.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>epsilon</code></td>
<td>
<p>the shape parameter to control the amount of robustness.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>stringIndexerOrderType</code></td>
<td>
<p>how to order categories of a string feature column. This is used to
decide the base level of a string feature as the last category
after ordering is dropped when encoding strings. Supported options
are &quot;frequencyDesc&quot;, &quot;frequencyAsc&quot;, &quot;alphabetDesc&quot;, and
&quot;alphabetAsc&quot;. The default value is &quot;frequencyDesc&quot;. When the
ordering is set to &quot;alphabetDesc&quot;, this drops the same category
as R when encoding strings.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>object</code></td>
<td>
<p>a Linear Regression Model model fitted by <code>spark.lm</code>.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>newData</code></td>
<td>
<p>a SparkDataFrame for testing.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>path</code></td>
<td>
<p>The directory where the model is saved.</p>
</td></tr>
<tr style="vertical-align: top;"><td><code>overwrite</code></td>
<td>
<p>Overwrites or not if the output path already exists. Default is FALSE
which means throw exception if the output path exists.</p>
</td></tr>
</table>
<h3>Value</h3>
<p><code>spark.lm</code> returns a fitted Linear Regression Model.
</p>
<p><code>summary</code> returns summary information of the fitted model, which is a list.
</p>
<p><code>predict</code> returns the predicted values based on a LinearRegressionModel.
</p>
<h3>Note</h3>
<p>spark.lm since 3.1.0
</p>
<p>summary(LinearRegressionModel) since 3.1.0
</p>
<p>predict(LinearRegressionModel) since 3.1.0
</p>
<p>write.ml(LinearRegressionModel, character) since 3.1.0
</p>
<h3>See Also</h3>
<p><a href="../../SparkR/help/read.ml.html">read.ml</a>
</p>
<h3>Examples</h3>
<pre><code class="r">## Not run:
##D df &lt;- read.df(&quot;data/mllib/sample_linear_regression_data.txt&quot;, source = &quot;libsvm&quot;)
##D
##D # fit Linear Regression Model
##D model &lt;- spark.lm(df, label ~ features, regParam = 0.01, maxIter = 1)
##D
##D # get the summary of the model
##D summary(model)
##D
##D # make predictions
##D predictions &lt;- predict(model, df)
##D
##D # save and load the model
##D path &lt;- &quot;path/to/model&quot;
##D write.ml(model, path)
##D savedModel &lt;- read.ml(path)
##D summary(savedModel)
## End(Not run)
</code></pre>
<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div>
</div>
</body></html>