blob: c1230acb3d6c8d5bf6074ff6aa1a5e410d8ecf38 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Calculates the approximate quantiles of a numerical column of...</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" type="text/css" href="R.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head><body>
<table width="100%" summary="page for approxQuantile {SparkR}"><tr><td>approxQuantile {SparkR}</td><td align="right">R Documentation</td></tr></table>
<h2>Calculates the approximate quantiles of a numerical column of a SparkDataFrame</h2>
<h3>Description</h3>
<p>Calculates the approximate quantiles of a numerical column of a SparkDataFrame.
The result of this algorithm has the following deterministic bound:
If the SparkDataFrame has N elements and if we request the quantile at probability p up to
error err, then the algorithm will return a sample x from the SparkDataFrame so that the
*exact* rank of x is close to (p * N). More precisely,
floor((p - err) * N) &lt;= rank(x) &lt;= ceil((p + err) * N).
This method implements a variation of the Greenwald-Khanna algorithm (with some speed
optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670
Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna.
</p>
<h3>Usage</h3>
<pre>
## S4 method for signature 'SparkDataFrame,character,numeric,numeric'
approxQuantile(x, col,
probabilities, relativeError)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
<p>A SparkDataFrame.</p>
</td></tr>
<tr valign="top"><td><code>col</code></td>
<td>
<p>The name of the numerical column.</p>
</td></tr>
<tr valign="top"><td><code>probabilities</code></td>
<td>
<p>A list of quantile probabilities. Each number must belong to [0, 1].
For example 0 is the minimum, 0.5 is the median, 1 is the maximum.</p>
</td></tr>
<tr valign="top"><td><code>relativeError</code></td>
<td>
<p>The relative target precision to achieve (&gt;= 0). If set to zero,
the exact quantiles are computed, which could be very expensive.
Note that values greater than 1 are accepted but give the same result as 1.</p>
</td></tr>
</table>
<h3>Value</h3>
<p>The approximate quantiles at the given probabilities.
</p>
<h3>Note</h3>
<p>approxQuantile since 2.0.0
</p>
<h3>See Also</h3>
<p>Other stat functions: <code><a href="corr.html">corr</a></code>,
<code><a href="corr.html">corr</a></code>, <code><a href="corr.html">corr</a></code>,
<code><a href="corr.html">corr,Column-method</a></code>,
<code><a href="corr.html">corr,SparkDataFrame-method</a></code>;
<code><a href="cov.html">cov</a></code>, <code><a href="cov.html">cov</a></code>, <code><a href="cov.html">cov</a></code>,
<code><a href="cov.html">cov,SparkDataFrame-method</a></code>,
<code><a href="cov.html">cov,characterOrColumn-method</a></code>,
<code><a href="cov.html">covar_samp</a></code>, <code><a href="cov.html">covar_samp</a></code>,
<code><a href="cov.html">covar_samp,characterOrColumn,characterOrColumn-method</a></code>;
<code><a href="crosstab.html">crosstab</a></code>,
<code><a href="crosstab.html">crosstab,SparkDataFrame,character,character-method</a></code>;
<code><a href="freqItems.html">freqItems</a></code>,
<code><a href="freqItems.html">freqItems,SparkDataFrame,character-method</a></code>;
<code><a href="sampleBy.html">sampleBy</a></code>, <code><a href="sampleBy.html">sampleBy</a></code>,
<code><a href="sampleBy.html">sampleBy,SparkDataFrame,character,list,numeric-method</a></code>
</p>
<h3>Examples</h3>
<pre><code class="r">## Not run:
##D df &lt;- read.json(&quot;/path/to/file.json&quot;)
##D quantiles &lt;- approxQuantile(df, &quot;key&quot;, c(0.5, 0.8), 0.0)
## End(Not run)
</code></pre>
<hr><div align="center">[Package <em>SparkR</em> version 2.1.1 <a href="00Index.html">Index</a>]</div>
</body></html>