site/docs/2.1.1/api/R/glm.html - spark-website - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <html><head><title>R: Generalized Linear Models (R-compliant)</title>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <link rel="stylesheet" type="text/css" href="R.css">

 <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
 </head><body>

 <table width="100%" summary="page for glm {SparkR}"><tr><td>glm {SparkR}</td><td align="right">R Documentation</td></tr></table>

 <h2>Generalized Linear Models (R-compliant)</h2>

 <h3>Description</h3>

 <p>Fits a generalized linear model, similarly to R's glm().
 </p>


 <h3>Usage</h3>

 <pre>
 glm(formula, family = gaussian, data, weights, subset, na.action,
   start = NULL, etastart, mustart, offset, control = list(...),
   model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
   contrasts = NULL, ...)

 ## S4 method for signature 'formula,ANY,SparkDataFrame'
 glm(formula, family = gaussian, data,
   epsilon = 1e-06, maxit = 25, weightCol = NULL)
 </pre>


 <h3>Arguments</h3>

 <table summary="R argblock">
 <tr valign="top"><td><code>formula</code></td>
 <td>
 <p>a symbolic description of the model to be fitted. Currently only a few formula
 operators are supported, including '~', '.', ':', '+', and '-'.</p>
 </td></tr>
 <tr valign="top"><td><code>family</code></td>
 <td>
 <p>a description of the error distribution and link function to be used in the model.
 This can be a character string naming a family function, a family function or
 the result of a call to a family function. Refer R family at
 <a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
 Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
 <code>Gamma</code>, and <code>poisson</code>.</p>
 </td></tr>
 <tr valign="top"><td><code>data</code></td>
 <td>
 <p>a SparkDataFrame or R's glm data for training.</p>
 </td></tr>
 <tr valign="top"><td><code>weights</code></td>
 <td>
 <p>an optional vector of &lsquo;prior weights&rsquo; to be used
 in the fitting process.  Should be <code>NULL</code> or a numeric vector.</p>
 </td></tr>
 <tr valign="top"><td><code>subset</code></td>
 <td>
 <p>an optional vector specifying a subset of observations
 to be used in the fitting process.</p>
 </td></tr>
 <tr valign="top"><td><code>na.action</code></td>
 <td>
 <p>a function which indicates what should happen
 when the data contain <code>NA</code>s.  The default is set by
 the <code>na.action</code> setting of <code><a href="../../base/html/options.html">options</a></code>, and is
 <code><a href="../../stats/html/na.fail.html">na.fail</a></code> if that is unset.  The &lsquo;factory-fresh&rsquo;
 default is <code><a href="nafunctions.html">na.omit</a></code>.  Another possible value is
 <code>NULL</code>, no action.  Value <code><a href="../../stats/html/na.fail.html">na.exclude</a></code> can be useful.</p>
 </td></tr>
 <tr valign="top"><td><code>start</code></td>
 <td>
 <p>starting values for the parameters in the linear predictor.</p>
 </td></tr>
 <tr valign="top"><td><code>etastart</code></td>
 <td>
 <p>starting values for the linear predictor.</p>
 </td></tr>
 <tr valign="top"><td><code>mustart</code></td>
 <td>
 <p>starting values for the vector of means.</p>
 </td></tr>
 <tr valign="top"><td><code>offset</code></td>
 <td>
 <p>this can be used to specify an <EM>a priori</EM> known
 component to be included in the linear predictor during fitting.
 This should be <code>NULL</code> or a numeric vector of length equal to
 the number of cases.  One or more <code><a href="../../stats/html/offset.html">offset</a></code> terms can be
 included in the formula instead or as well, and if more than one is
 specified their sum is used.  See <code><a href="../../stats/html/model.extract.html">model.offset</a></code>.</p>
 </td></tr>
 <tr valign="top"><td><code>control</code></td>
 <td>
 <p>a list of parameters for controlling the fitting
 process.  For <code>glm.fit</code> this is passed to
 <code><a href="../../stats/html/glm.control.html">glm.control</a></code>.</p>
 </td></tr>
 <tr valign="top"><td><code>model</code></td>
 <td>
 <p>a logical value indicating whether <EM>model frame</EM>
 should be included as a component of the returned value.</p>
 </td></tr>
 <tr valign="top"><td><code>method</code></td>
 <td>
 <p>the method to be used in fitting the model.  The default
 method <code>"glm.fit"</code> uses iteratively reweighted least squares
 (IWLS): the alternative <code>"model.frame"</code> returns the model frame
 and does no fitting.
 </p>
 <p>User-supplied fitting functions can be supplied either as a function
 or a character string naming a function, with a function which takes
 the same arguments as <code>glm.fit</code>.  If specified as a character
 string it is looked up from within the <span class="pkg">stats</span> namespace.
 </p>
 </td></tr>
 <tr valign="top"><td><code>x,y</code></td>
 <td>
 <p>For <code>glm</code>: logical values indicating whether the response vector
 and model matrix used in the fitting process should be returned as
 components of the returned value.</p>
 </td></tr>
 <tr valign="top"><td><code>contrasts</code></td>
 <td>
 <p>an optional list. See the <code>contrasts.arg</code>
 of <code>model.matrix.default</code>.</p>
 </td></tr>
 <tr valign="top"><td><code>...</code></td>
 <td>

 <p>For <code>glm</code>: arguments to be used to form the default
 <code>control</code> argument if it is not supplied directly.
 </p>
 <p>For <code>weights</code>: further arguments passed to or from other methods.
 </p>
 </td></tr>
 <tr valign="top"><td><code>epsilon</code></td>
 <td>
 <p>positive convergence tolerance of iterations.</p>
 </td></tr>
 <tr valign="top"><td><code>maxit</code></td>
 <td>
 <p>integer giving the maximal number of IRLS iterations.</p>
 </td></tr>
 <tr valign="top"><td><code>weightCol</code></td>
 <td>
 <p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
 weights as 1.0.</p>
 </td></tr>
 </table>


 <h3>Value</h3>

 <p><code>glm</code> returns a fitted generalized linear model.
 </p>


 <h3>Note</h3>

 <p>glm since 1.5.0
 </p>


 <h3>See Also</h3>

 <p><a href="spark.glm.html">spark.glm</a>
 </p>


 <h3>Examples</h3>

 <pre><code class="r">## Not run:
 ##D sparkR.session()
 ##D data(iris)
 ##D df &lt;- createDataFrame(iris)
 ##D model &lt;- glm(Sepal_Length ~ Sepal_Width, df, family = &quot;gaussian&quot;)
 ##D summary(model)
 ## End(Not run)
 </code></pre>


 <hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
 </body></html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html><head><title>R: Generalized Linear Models (R-compliant)</title>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<link rel="stylesheet" type="text/css" href="R.css">

	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
	<script>hljs.initHighlightingOnLoad();</script>
	</head><body>

	<table width="100%" summary="page for glm {SparkR}"><tr><td>glm {SparkR}</td><td align="right">R Documentation</td></tr></table>

	<h2>Generalized Linear Models (R-compliant)</h2>

	<h3>Description</h3>

	<p>Fits a generalized linear model, similarly to R's glm().
	</p>


	<h3>Usage</h3>

	<pre>
	glm(formula, family = gaussian, data, weights, subset, na.action,
	start = NULL, etastart, mustart, offset, control = list(...),
	model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
	contrasts = NULL, ...)

	## S4 method for signature 'formula,ANY,SparkDataFrame'
	glm(formula, family = gaussian, data,
	epsilon = 1e-06, maxit = 25, weightCol = NULL)
	</pre>


	<h3>Arguments</h3>

	<table summary="R argblock">
	<tr valign="top"><td><code>formula</code></td>
	<td>
	<p>a symbolic description of the model to be fitted. Currently only a few formula
	operators are supported, including '~', '.', ':', '+', and '-'.</p>
	</td></tr>
	<tr valign="top"><td><code>family</code></td>
	<td>
	<p>a description of the error distribution and link function to be used in the model.
	This can be a character string naming a family function, a family function or
	the result of a call to a family function. Refer R family at
	<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
	Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
	<code>Gamma</code>, and <code>poisson</code>.</p>
	</td></tr>
	<tr valign="top"><td><code>data</code></td>
	<td>
	<p>a SparkDataFrame or R's glm data for training.</p>
	</td></tr>
	<tr valign="top"><td><code>weights</code></td>
	<td>
	<p>an optional vector of ‘prior weights’ to be used
	in the fitting process. Should be <code>NULL</code> or a numeric vector.</p>
	</td></tr>
	<tr valign="top"><td><code>subset</code></td>
	<td>
	<p>an optional vector specifying a subset of observations
	to be used in the fitting process.</p>
	</td></tr>
	<tr valign="top"><td><code>na.action</code></td>
	<td>
	<p>a function which indicates what should happen
	when the data contain <code>NA</code>s. The default is set by
	the <code>na.action</code> setting of <code><a href="../../base/html/options.html">options</a></code>, and is
	<code><a href="../../stats/html/na.fail.html">na.fail</a></code> if that is unset. The ‘factory-fresh’
	default is <code><a href="nafunctions.html">na.omit</a></code>. Another possible value is
	<code>NULL</code>, no action. Value <code><a href="../../stats/html/na.fail.html">na.exclude</a></code> can be useful.</p>
	</td></tr>
	<tr valign="top"><td><code>start</code></td>
	<td>
	<p>starting values for the parameters in the linear predictor.</p>
	</td></tr>
	<tr valign="top"><td><code>etastart</code></td>
	<td>
	<p>starting values for the linear predictor.</p>
	</td></tr>
	<tr valign="top"><td><code>mustart</code></td>
	<td>
	<p>starting values for the vector of means.</p>
	</td></tr>
	<tr valign="top"><td><code>offset</code></td>
	<td>
	<p>this can be used to specify an <EM>a priori</EM> known
	component to be included in the linear predictor during fitting.
	This should be <code>NULL</code> or a numeric vector of length equal to
	the number of cases. One or more <code><a href="../../stats/html/offset.html">offset</a></code> terms can be
	included in the formula instead or as well, and if more than one is
	specified their sum is used. See <code><a href="../../stats/html/model.extract.html">model.offset</a></code>.</p>
	</td></tr>
	<tr valign="top"><td><code>control</code></td>
	<td>
	<p>a list of parameters for controlling the fitting
	process. For <code>glm.fit</code> this is passed to
	<code><a href="../../stats/html/glm.control.html">glm.control</a></code>.</p>
	</td></tr>
	<tr valign="top"><td><code>model</code></td>
	<td>
	<p>a logical value indicating whether <EM>model frame</EM>
	should be included as a component of the returned value.</p>
	</td></tr>
	<tr valign="top"><td><code>method</code></td>
	<td>
	<p>the method to be used in fitting the model. The default
	method <code>"glm.fit"</code> uses iteratively reweighted least squares
	(IWLS): the alternative <code>"model.frame"</code> returns the model frame
	and does no fitting.
	</p>
	<p>User-supplied fitting functions can be supplied either as a function
	or a character string naming a function, with a function which takes
	the same arguments as <code>glm.fit</code>. If specified as a character
	string it is looked up from within the <span class="pkg">stats</span> namespace.
	</p>
	</td></tr>
	<tr valign="top"><td><code>x,y</code></td>
	<td>
	<p>For <code>glm</code>: logical values indicating whether the response vector
	and model matrix used in the fitting process should be returned as
	components of the returned value.</p>
	</td></tr>
	<tr valign="top"><td><code>contrasts</code></td>
	<td>
	<p>an optional list. See the <code>contrasts.arg</code>
	of <code>model.matrix.default</code>.</p>
	</td></tr>
	<tr valign="top"><td><code>...</code></td>
	<td>

	<p>For <code>glm</code>: arguments to be used to form the default
	<code>control</code> argument if it is not supplied directly.
	</p>
	<p>For <code>weights</code>: further arguments passed to or from other methods.
	</p>
	</td></tr>
	<tr valign="top"><td><code>epsilon</code></td>
	<td>
	<p>positive convergence tolerance of iterations.</p>
	</td></tr>
	<tr valign="top"><td><code>maxit</code></td>
	<td>
	<p>integer giving the maximal number of IRLS iterations.</p>
	</td></tr>
	<tr valign="top"><td><code>weightCol</code></td>
	<td>
	<p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
	weights as 1.0.</p>
	</td></tr>
	</table>


	<h3>Value</h3>

	<p><code>glm</code> returns a fitted generalized linear model.
	</p>


	<h3>Note</h3>

	<p>glm since 1.5.0
	</p>


	<h3>See Also</h3>

	<p><a href="spark.glm.html">spark.glm</a>
	</p>


	<h3>Examples</h3>

	<pre><code class="r">## Not run:
	##D sparkR.session()
	##D data(iris)
	##D df <- createDataFrame(iris)
	##D model <- glm(Sepal_Length ~ Sepal_Width, df, family = "gaussian")
	##D summary(model)
	## End(Not run)
	</code></pre>


	<hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
	</body></html>