site/docs/2.1.1/api/R/spark.glm.html - spark-website - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <html><head><title>R: Generalized Linear Models</title>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <link rel="stylesheet" type="text/css" href="R.css">

 <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
 </head><body>

 <table width="100%" summary="page for spark.glm {SparkR}"><tr><td>spark.glm {SparkR}</td><td align="right">R Documentation</td></tr></table>

 <h2>Generalized Linear Models</h2>

 <h3>Description</h3>

 <p>Fits generalized linear model against a SparkDataFrame.
 Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
 predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
 </p>


 <h3>Usage</h3>

 <pre>
 spark.glm(data, formula, ...)

 ## S4 method for signature 'SparkDataFrame,formula'
 spark.glm(data, formula, family = gaussian,
   tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0)

 ## S4 method for signature 'GeneralizedLinearRegressionModel'
 summary(object)

 ## S3 method for class 'summary.GeneralizedLinearRegressionModel'
 print(x, ...)

 ## S4 method for signature 'GeneralizedLinearRegressionModel'
 predict(object, newData)

 ## S4 method for signature 'GeneralizedLinearRegressionModel,character'
 write.ml(object, path,
   overwrite = FALSE)
 </pre>


 <h3>Arguments</h3>

 <table summary="R argblock">
 <tr valign="top"><td><code>data</code></td>
 <td>
 <p>a SparkDataFrame for training.</p>
 </td></tr>
 <tr valign="top"><td><code>formula</code></td>
 <td>
 <p>a symbolic description of the model to be fitted. Currently only a few formula
 operators are supported, including '~', '.', ':', '+', and '-'.</p>
 </td></tr>
 <tr valign="top"><td><code>...</code></td>
 <td>
 <p>additional arguments passed to the method.</p>
 </td></tr>
 <tr valign="top"><td><code>family</code></td>
 <td>
 <p>a description of the error distribution and link function to be used in the model.
 This can be a character string naming a family function, a family function or
 the result of a call to a family function. Refer R family at
 <a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
 Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
 <code>Gamma</code>, and <code>poisson</code>.</p>
 </td></tr>
 <tr valign="top"><td><code>tol</code></td>
 <td>
 <p>positive convergence tolerance of iterations.</p>
 </td></tr>
 <tr valign="top"><td><code>maxIter</code></td>
 <td>
 <p>integer giving the maximal number of IRLS iterations.</p>
 </td></tr>
 <tr valign="top"><td><code>weightCol</code></td>
 <td>
 <p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
 weights as 1.0.</p>
 </td></tr>
 <tr valign="top"><td><code>regParam</code></td>
 <td>
 <p>regularization parameter for L2 regularization.</p>
 </td></tr>
 <tr valign="top"><td><code>object</code></td>
 <td>
 <p>a fitted generalized linear model.</p>
 </td></tr>
 <tr valign="top"><td><code>x</code></td>
 <td>
 <p>summary object of fitted generalized linear model returned by <code>summary</code> function.</p>
 </td></tr>
 <tr valign="top"><td><code>newData</code></td>
 <td>
 <p>a SparkDataFrame for testing.</p>
 </td></tr>
 <tr valign="top"><td><code>path</code></td>
 <td>
 <p>the directory where the model is saved.</p>
 </td></tr>
 <tr valign="top"><td><code>overwrite</code></td>
 <td>
 <p>overwrites or not if the output path already exists. Default is FALSE
 which means throw exception if the output path exists.</p>
 </td></tr>
 </table>


 <h3>Value</h3>

 <p><code>spark.glm</code> returns a fitted generalized linear model.
 </p>
 <p><code>summary</code> returns summary information of the fitted model, which is a list.
 The list of components includes at least the <code>coefficients</code> (coefficients matrix, which includes
 coefficients, standard error of coefficients, t value and p value),
 <code>null.deviance</code> (null/residual degrees of freedom), <code>aic</code> (AIC)
 and <code>iter</code> (number of iterations IRLS takes). If there are collinear columns in the data,
 the coefficients matrix only provides coefficients.
 </p>
 <p><code>predict</code> returns a SparkDataFrame containing predicted labels in a column named
 &quot;prediction&quot;.
 </p>


 <h3>Note</h3>

 <p>spark.glm since 2.0.0
 </p>
 <p>summary(GeneralizedLinearRegressionModel) since 2.0.0
 </p>
 <p>print.summary.GeneralizedLinearRegressionModel since 2.0.0
 </p>
 <p>predict(GeneralizedLinearRegressionModel) since 1.5.0
 </p>
 <p>write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
 </p>


 <h3>See Also</h3>

 <p><a href="glm.html">glm</a>, <a href="read.ml.html">read.ml</a>
 </p>


 <h3>Examples</h3>

 <pre><code class="r">## Not run:
 ##D sparkR.session()
 ##D data(iris)
 ##D df &lt;- createDataFrame(iris)
 ##D model &lt;- spark.glm(df, Sepal_Length ~ Sepal_Width, family = &quot;gaussian&quot;)
 ##D summary(model)
 ##D
 ##D # fitted values on training data
 ##D fitted &lt;- predict(model, df)
 ##D head(select(fitted, &quot;Sepal_Length&quot;, &quot;prediction&quot;))
 ##D
 ##D # save fitted model to input path
 ##D path &lt;- &quot;path/to/model&quot;
 ##D write.ml(model, path)
 ##D
 ##D # can also read back the saved model and print
 ##D savedModel &lt;- read.ml(path)
 ##D summary(savedModel)
 ## End(Not run)
 </code></pre>


 <hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
 </body></html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html><head><title>R: Generalized Linear Models</title>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<link rel="stylesheet" type="text/css" href="R.css">

	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
	<script>hljs.initHighlightingOnLoad();</script>
	</head><body>

	<table width="100%" summary="page for spark.glm {SparkR}"><tr><td>spark.glm {SparkR}</td><td align="right">R Documentation</td></tr></table>

	<h2>Generalized Linear Models</h2>

	<h3>Description</h3>

	<p>Fits generalized linear model against a SparkDataFrame.
	Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
	predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
	</p>


	<h3>Usage</h3>

	<pre>
	spark.glm(data, formula, ...)

	## S4 method for signature 'SparkDataFrame,formula'
	spark.glm(data, formula, family = gaussian,
	tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0)

	## S4 method for signature 'GeneralizedLinearRegressionModel'
	summary(object)

	## S3 method for class 'summary.GeneralizedLinearRegressionModel'
	print(x, ...)

	## S4 method for signature 'GeneralizedLinearRegressionModel'
	predict(object, newData)

	## S4 method for signature 'GeneralizedLinearRegressionModel,character'
	write.ml(object, path,
	overwrite = FALSE)
	</pre>


	<h3>Arguments</h3>

	<table summary="R argblock">
	<tr valign="top"><td><code>data</code></td>
	<td>
	<p>a SparkDataFrame for training.</p>
	</td></tr>
	<tr valign="top"><td><code>formula</code></td>
	<td>
	<p>a symbolic description of the model to be fitted. Currently only a few formula
	operators are supported, including '~', '.', ':', '+', and '-'.</p>
	</td></tr>
	<tr valign="top"><td><code>...</code></td>
	<td>
	<p>additional arguments passed to the method.</p>
	</td></tr>
	<tr valign="top"><td><code>family</code></td>
	<td>
	<p>a description of the error distribution and link function to be used in the model.
	This can be a character string naming a family function, a family function or
	the result of a call to a family function. Refer R family at
	<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
	Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
	<code>Gamma</code>, and <code>poisson</code>.</p>
	</td></tr>
	<tr valign="top"><td><code>tol</code></td>
	<td>
	<p>positive convergence tolerance of iterations.</p>
	</td></tr>
	<tr valign="top"><td><code>maxIter</code></td>
	<td>
	<p>integer giving the maximal number of IRLS iterations.</p>
	</td></tr>
	<tr valign="top"><td><code>weightCol</code></td>
	<td>
	<p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
	weights as 1.0.</p>
	</td></tr>
	<tr valign="top"><td><code>regParam</code></td>
	<td>
	<p>regularization parameter for L2 regularization.</p>
	</td></tr>
	<tr valign="top"><td><code>object</code></td>
	<td>
	<p>a fitted generalized linear model.</p>
	</td></tr>
	<tr valign="top"><td><code>x</code></td>
	<td>
	<p>summary object of fitted generalized linear model returned by <code>summary</code> function.</p>
	</td></tr>
	<tr valign="top"><td><code>newData</code></td>
	<td>
	<p>a SparkDataFrame for testing.</p>
	</td></tr>
	<tr valign="top"><td><code>path</code></td>
	<td>
	<p>the directory where the model is saved.</p>
	</td></tr>
	<tr valign="top"><td><code>overwrite</code></td>
	<td>
	<p>overwrites or not if the output path already exists. Default is FALSE
	which means throw exception if the output path exists.</p>
	</td></tr>
	</table>


	<h3>Value</h3>

	<p><code>spark.glm</code> returns a fitted generalized linear model.
	</p>
	<p><code>summary</code> returns summary information of the fitted model, which is a list.
	The list of components includes at least the <code>coefficients</code> (coefficients matrix, which includes
	coefficients, standard error of coefficients, t value and p value),
	<code>null.deviance</code> (null/residual degrees of freedom), <code>aic</code> (AIC)
	and <code>iter</code> (number of iterations IRLS takes). If there are collinear columns in the data,
	the coefficients matrix only provides coefficients.
	</p>
	<p><code>predict</code> returns a SparkDataFrame containing predicted labels in a column named
	"prediction".
	</p>


	<h3>Note</h3>

	<p>spark.glm since 2.0.0
	</p>
	<p>summary(GeneralizedLinearRegressionModel) since 2.0.0
	</p>
	<p>print.summary.GeneralizedLinearRegressionModel since 2.0.0
	</p>
	<p>predict(GeneralizedLinearRegressionModel) since 1.5.0
	</p>
	<p>write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
	</p>


	<h3>See Also</h3>

	<p><a href="glm.html">glm</a>, <a href="read.ml.html">read.ml</a>
	</p>


	<h3>Examples</h3>

	<pre><code class="r">## Not run:
	##D sparkR.session()
	##D data(iris)
	##D df <- createDataFrame(iris)
	##D model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
	##D summary(model)
	##D
	##D # fitted values on training data
	##D fitted <- predict(model, df)
	##D head(select(fitted, "Sepal_Length", "prediction"))
	##D
	##D # save fitted model to input path
	##D path <- "path/to/model"
	##D write.ml(model, path)
	##D
	##D # can also read back the saved model and print
	##D savedModel <- read.ml(path)
	##D summary(savedModel)
	## End(Not run)
	</code></pre>


	<hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
	</body></html>