blob: 7b9aae9e995294dde4c96d40aba7498ed2bd30d5 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Generalized Linear Models (R-compliant)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" type="text/css" href="R.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head><body>
<table width="100%" summary="page for glm {SparkR}"><tr><td>glm {SparkR}</td><td align="right">R Documentation</td></tr></table>
<h2>Generalized Linear Models (R-compliant)</h2>
<h3>Description</h3>
<p>Fits a generalized linear model, similarly to R's glm().
</p>
<h3>Usage</h3>
<pre>
glm(formula, family = gaussian, data, weights, subset, na.action,
start = NULL, etastart, mustart, offset, control = list(...),
model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
contrasts = NULL, ...)
## S4 method for signature 'formula,ANY,SparkDataFrame'
glm(formula, family = gaussian, data,
epsilon = 1e-06, maxit = 25, weightCol = NULL)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>formula</code></td>
<td>
<p>a symbolic description of the model to be fitted. Currently only a few formula
operators are supported, including '~', '.', ':', '+', and '-'.</p>
</td></tr>
<tr valign="top"><td><code>family</code></td>
<td>
<p>a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or
the result of a call to a family function. Refer R family at
<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
<code>Gamma</code>, and <code>poisson</code>.</p>
</td></tr>
<tr valign="top"><td><code>data</code></td>
<td>
<p>a SparkDataFrame or R's glm data for training.</p>
</td></tr>
<tr valign="top"><td><code>weights</code></td>
<td>
<p>an optional vector of &lsquo;prior weights&rsquo; to be used
in the fitting process. Should be <code>NULL</code> or a numeric vector.</p>
</td></tr>
<tr valign="top"><td><code>subset</code></td>
<td>
<p>an optional vector specifying a subset of observations
to be used in the fitting process.</p>
</td></tr>
<tr valign="top"><td><code>na.action</code></td>
<td>
<p>a function which indicates what should happen
when the data contain <code>NA</code>s. The default is set by
the <code>na.action</code> setting of <code><a href="../../base/html/options.html">options</a></code>, and is
<code><a href="../../stats/html/na.fail.html">na.fail</a></code> if that is unset. The &lsquo;factory-fresh&rsquo;
default is <code><a href="nafunctions.html">na.omit</a></code>. Another possible value is
<code>NULL</code>, no action. Value <code><a href="../../stats/html/na.fail.html">na.exclude</a></code> can be useful.</p>
</td></tr>
<tr valign="top"><td><code>start</code></td>
<td>
<p>starting values for the parameters in the linear predictor.</p>
</td></tr>
<tr valign="top"><td><code>etastart</code></td>
<td>
<p>starting values for the linear predictor.</p>
</td></tr>
<tr valign="top"><td><code>mustart</code></td>
<td>
<p>starting values for the vector of means.</p>
</td></tr>
<tr valign="top"><td><code>offset</code></td>
<td>
<p>this can be used to specify an <EM>a priori</EM> known
component to be included in the linear predictor during fitting.
This should be <code>NULL</code> or a numeric vector of length equal to
the number of cases. One or more <code><a href="../../stats/html/offset.html">offset</a></code> terms can be
included in the formula instead or as well, and if more than one is
specified their sum is used. See <code><a href="../../stats/html/model.extract.html">model.offset</a></code>.</p>
</td></tr>
<tr valign="top"><td><code>control</code></td>
<td>
<p>a list of parameters for controlling the fitting
process. For <code>glm.fit</code> this is passed to
<code><a href="../../stats/html/glm.control.html">glm.control</a></code>.</p>
</td></tr>
<tr valign="top"><td><code>model</code></td>
<td>
<p>a logical value indicating whether <EM>model frame</EM>
should be included as a component of the returned value.</p>
</td></tr>
<tr valign="top"><td><code>method</code></td>
<td>
<p>the method to be used in fitting the model. The default
method <code>"glm.fit"</code> uses iteratively reweighted least squares
(IWLS): the alternative <code>"model.frame"</code> returns the model frame
and does no fitting.
</p>
<p>User-supplied fitting functions can be supplied either as a function
or a character string naming a function, with a function which takes
the same arguments as <code>glm.fit</code>. If specified as a character
string it is looked up from within the <span class="pkg">stats</span> namespace.
</p>
</td></tr>
<tr valign="top"><td><code>x,y</code></td>
<td>
<p>For <code>glm</code>: logical values indicating whether the response vector
and model matrix used in the fitting process should be returned as
components of the returned value.</p>
</td></tr>
<tr valign="top"><td><code>contrasts</code></td>
<td>
<p>an optional list. See the <code>contrasts.arg</code>
of <code>model.matrix.default</code>.</p>
</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
<p>For <code>glm</code>: arguments to be used to form the default
<code>control</code> argument if it is not supplied directly.
</p>
<p>For <code>weights</code>: further arguments passed to or from other methods.
</p>
</td></tr>
<tr valign="top"><td><code>epsilon</code></td>
<td>
<p>positive convergence tolerance of iterations.</p>
</td></tr>
<tr valign="top"><td><code>maxit</code></td>
<td>
<p>integer giving the maximal number of IRLS iterations.</p>
</td></tr>
<tr valign="top"><td><code>weightCol</code></td>
<td>
<p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
weights as 1.0.</p>
</td></tr>
</table>
<h3>Value</h3>
<p><code>glm</code> returns a fitted generalized linear model.
</p>
<h3>Note</h3>
<p>glm since 1.5.0
</p>
<h3>See Also</h3>
<p><a href="spark.glm.html">spark.glm</a>
</p>
<h3>Examples</h3>
<pre><code class="r">## Not run:
##D sparkR.session()
##D data(iris)
##D df &lt;- createDataFrame(iris)
##D model &lt;- glm(Sepal_Length ~ Sepal_Width, df, family = &quot;gaussian&quot;)
##D summary(model)
## End(Not run)
</code></pre>
<hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
</body></html>