| <!DOCTYPE html><html><head><title>R: Linear Regression Model</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css"> |
| <script type="text/javascript"> |
| const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"}; |
| function processMathHTML() { |
| var l = document.getElementsByClassName('reqn'); |
| for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } |
| return; |
| }</script> |
| <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js" |
| onload="processMathHTML();"></script> |
| <link rel="stylesheet" type="text/css" href="R.css" /> |
| |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> |
| <script>hljs.initHighlightingOnLoad();</script> |
| </head><body><div class="container"> |
| |
| <table style="width: 100%;"><tr><td>spark.lm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> |
| |
| <h2>Linear Regression Model</h2> |
| |
| <h3>Description</h3> |
| |
| <p><code>spark.lm</code> fits a linear regression model against a SparkDataFrame. |
| Users can call <code>summary</code> to print a summary of the fitted model, |
| <code>predict</code> to make predictions on new data, |
| and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. |
| </p> |
| |
| |
| <h3>Usage</h3> |
| |
| <pre><code class='language-R'>spark.lm(data, formula, ...) |
| |
| ## S4 method for signature 'SparkDataFrame,formula' |
| spark.lm( |
| data, |
| formula, |
| maxIter = 100L, |
| regParam = 0, |
| elasticNetParam = 0, |
| tol = 1e-06, |
| standardization = TRUE, |
| solver = c("auto", "l-bfgs", "normal"), |
| weightCol = NULL, |
| aggregationDepth = 2L, |
| loss = c("squaredError", "huber"), |
| epsilon = 1.35, |
| stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc", |
| "alphabetAsc") |
| ) |
| |
| ## S4 method for signature 'LinearRegressionModel' |
| summary(object) |
| |
| ## S4 method for signature 'LinearRegressionModel' |
| predict(object, newData) |
| |
| ## S4 method for signature 'LinearRegressionModel,character' |
| write.ml(object, path, overwrite = FALSE) |
| </code></pre> |
| |
| |
| <h3>Arguments</h3> |
| |
| <table> |
| <tr style="vertical-align: top;"><td><code>data</code></td> |
| <td> |
| <p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>formula</code></td> |
| <td> |
| <p>a symbolic description of the model to be fitted. Currently only a few formula |
| operators are supported, including '~', '.', ':', '+', and '-'.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>...</code></td> |
| <td> |
| <p>additional arguments passed to the method.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>maxIter</code></td> |
| <td> |
| <p>maximum iteration number.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>regParam</code></td> |
| <td> |
| <p>the regularization parameter.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>elasticNetParam</code></td> |
| <td> |
| <p>the ElasticNet mixing parameter, in range [0, 1]. |
| For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>tol</code></td> |
| <td> |
| <p>convergence tolerance of iterations.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>standardization</code></td> |
| <td> |
| <p>whether to standardize the training features before fitting the model.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>solver</code></td> |
| <td> |
| <p>The solver algorithm for optimization. |
| Supported options: "l-bfgs", "normal" and "auto".</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>weightCol</code></td> |
| <td> |
| <p>weight column name.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>aggregationDepth</code></td> |
| <td> |
| <p>suggested depth for treeAggregate (>= 2).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>loss</code></td> |
| <td> |
| <p>the loss function to be optimized. Supported options: "squaredError" and "huber".</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>epsilon</code></td> |
| <td> |
| <p>the shape parameter to control the amount of robustness.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>stringIndexerOrderType</code></td> |
| <td> |
| <p>how to order categories of a string feature column. This is used to |
| decide the base level of a string feature as the last category |
| after ordering is dropped when encoding strings. Supported options |
| are "frequencyDesc", "frequencyAsc", "alphabetDesc", and |
| "alphabetAsc". The default value is "frequencyDesc". When the |
| ordering is set to "alphabetDesc", this drops the same category |
| as R when encoding strings.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>object</code></td> |
| <td> |
| <p>a Linear Regression Model model fitted by <code>spark.lm</code>.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>newData</code></td> |
| <td> |
| <p>a SparkDataFrame for testing.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>path</code></td> |
| <td> |
| <p>The directory where the model is saved.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>overwrite</code></td> |
| <td> |
| <p>Overwrites or not if the output path already exists. Default is FALSE |
| which means throw exception if the output path exists.</p> |
| </td></tr> |
| </table> |
| |
| |
| <h3>Value</h3> |
| |
| <p><code>spark.lm</code> returns a fitted Linear Regression Model. |
| </p> |
| <p><code>summary</code> returns summary information of the fitted model, which is a list. |
| </p> |
| <p><code>predict</code> returns the predicted values based on a LinearRegressionModel. |
| </p> |
| |
| |
| <h3>Note</h3> |
| |
| <p>spark.lm since 3.1.0 |
| </p> |
| <p>summary(LinearRegressionModel) since 3.1.0 |
| </p> |
| <p>predict(LinearRegressionModel) since 3.1.0 |
| </p> |
| <p>write.ml(LinearRegressionModel, character) since 3.1.0 |
| </p> |
| |
| |
| <h3>See Also</h3> |
| |
| <p><a href="../../SparkR/help/read.ml.html">read.ml</a> |
| </p> |
| |
| |
| <h3>Examples</h3> |
| |
| <pre><code class="r">## Not run: |
| ##D df <- read.df("data/mllib/sample_linear_regression_data.txt", source = "libsvm") |
| ##D |
| ##D # fit Linear Regression Model |
| ##D model <- spark.lm(df, label ~ features, regParam = 0.01, maxIter = 1) |
| ##D |
| ##D # get the summary of the model |
| ##D summary(model) |
| ##D |
| ##D # make predictions |
| ##D predictions <- predict(model, df) |
| ##D |
| ##D # save and load the model |
| ##D path <- "path/to/model" |
| ##D write.ml(model, path) |
| ##D savedModel <- read.ml(path) |
| ##D summary(savedModel) |
| ## End(Not run) |
| </code></pre> |
| |
| |
| <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div> |
| </div> |
| </body></html> |