site/docs/3.0.1/generated-agg-funcs-table.html - spark-website - Git at Google

 <table class="table">
   <thead>
     <tr>
       <th style="width:25%">Function</th>
       <th>Description</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <td>any(expr)</td>
       <td>Returns true if at least one value of `expr` is true.</td>
     </tr>
     <tr>
       <td>approx_count_distinct(expr[, relativeSD])</td>
       <td>Returns the estimated cardinality by HyperLogLog++.
       `relativeSD` defines the maximum relative standard deviation allowed.</td>
     </tr>
     <tr>
       <td>approx_percentile(col, percentage [, accuracy])</td>
       <td>Returns the approximate percentile value of numeric
       column `col` at the given percentage. The value of percentage must be between 0.0
       and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which
       controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields
       better accuracy, `1.0/accuracy` is the relative error of the approximation.
       When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
       In this case, returns the approximate percentile array of column `col` at the given
       percentage array.</td>
     </tr>
     <tr>
       <td>avg(expr)</td>
       <td>Returns the mean calculated from values of a group.</td>
     </tr>
     <tr>
       <td>bit_or(expr)</td>
       <td>Returns the bitwise OR of all non-null input values, or null if none.</td>
     </tr>
     <tr>
       <td>bit_xor(expr)</td>
       <td>Returns the bitwise XOR of all non-null input values, or null if none.</td>
     </tr>
     <tr>
       <td>bool_and(expr)</td>
       <td>Returns true if all values of `expr` are true.</td>
     </tr>
     <tr>
       <td>bool_or(expr)</td>
       <td>Returns true if at least one value of `expr` is true.</td>
     </tr>
     <tr>
       <td>collect_list(expr)</td>
       <td>Collects and returns a list of non-unique elements.</td>
     </tr>
     <tr>
       <td>collect_set(expr)</td>
       <td>Collects and returns a set of unique elements.</td>
     </tr>
     <tr>
       <td>corr(expr1, expr2)</td>
       <td>Returns Pearson coefficient of correlation between a set of number pairs.</td>
     </tr>
     <tr>
       <td>count(*)</td>
       <td>Returns the total number of retrieved rows, including rows containing null.</td>
     </tr>
     <tr>
       <td>count(expr[, expr...])</td>
       <td>Returns the number of rows for which the supplied expression(s) are all non-null.</td>
     </tr>
     <tr>
       <td>count(DISTINCT expr[, expr...])</td>
       <td>Returns the number of rows for which the supplied expression(s) are unique and non-null.</td>
     </tr>
     <tr>
       <td>count_if(expr)</td>
       <td>Returns the number of `TRUE` values for the expression.</td>
     </tr>
     <tr>
       <td>count_min_sketch(col, eps, confidence, seed)</td>
       <td>Returns a count-min sketch of a column with the given esp,
       confidence and seed. The result is an array of bytes, which can be deserialized to a
       `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for
       cardinality estimation using sub-linear space.</td>
     </tr>
     <tr>
       <td>covar_pop(expr1, expr2)</td>
       <td>Returns the population covariance of a set of number pairs.</td>
     </tr>
     <tr>
       <td>covar_samp(expr1, expr2)</td>
       <td>Returns the sample covariance of a set of number pairs.</td>
     </tr>
     <tr>
       <td>every(expr)</td>
       <td>Returns true if all values of `expr` are true.</td>
     </tr>
     <tr>
       <td>first(expr[, isIgnoreNull])</td>
       <td>Returns the first value of `expr` for a group of rows.
       If `isIgnoreNull` is true, returns only non-null values.</td>
     </tr>
     <tr>
       <td>first_value(expr[, isIgnoreNull])</td>
       <td>Returns the first value of `expr` for a group of rows.
       If `isIgnoreNull` is true, returns only non-null values.</td>
     </tr>
     <tr>
       <td>kurtosis(expr)</td>
       <td>Returns the kurtosis value calculated from values of a group.</td>
     </tr>
     <tr>
       <td>last(expr[, isIgnoreNull])</td>
       <td>Returns the last value of `expr` for a group of rows.
       If `isIgnoreNull` is true, returns only non-null values</td>
     </tr>
     <tr>
       <td>last_value(expr[, isIgnoreNull])</td>
       <td>Returns the last value of `expr` for a group of rows.
       If `isIgnoreNull` is true, returns only non-null values</td>
     </tr>
     <tr>
       <td>max(expr)</td>
       <td>Returns the maximum value of `expr`.</td>
     </tr>
     <tr>
       <td>max_by(x, y)</td>
       <td>Returns the value of `x` associated with the maximum value of `y`.</td>
     </tr>
     <tr>
       <td>mean(expr)</td>
       <td>Returns the mean calculated from values of a group.</td>
     </tr>
     <tr>
       <td>min(expr)</td>
       <td>Returns the minimum value of `expr`.</td>
     </tr>
     <tr>
       <td>min_by(x, y)</td>
       <td>Returns the value of `x` associated with the minimum value of `y`.</td>
     </tr>
     <tr>
       <td>percentile(col, percentage [, frequency])</td>
       <td>Returns the exact percentile value of numeric column
        `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The
        value of frequency should be positive integral</td>
     </tr>
     <tr>
       <td>percentile(col, array(percentage1 [, percentage2]...) [, frequency])</td>
       <td>Returns the exact
       percentile value array of numeric column `col` at the given percentage(s). Each value
       of the percentage array must be between 0.0 and 1.0. The value of frequency should be
       positive integral</td>
     </tr>
     <tr>
       <td>percentile_approx(col, percentage [, accuracy])</td>
       <td>Returns the approximate percentile value of numeric
       column `col` at the given percentage. The value of percentage must be between 0.0
       and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which
       controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields
       better accuracy, `1.0/accuracy` is the relative error of the approximation.
       When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
       In this case, returns the approximate percentile array of column `col` at the given
       percentage array.</td>
     </tr>
     <tr>
       <td>skewness(expr)</td>
       <td>Returns the skewness value calculated from values of a group.</td>
     </tr>
     <tr>
       <td>some(expr)</td>
       <td>Returns true if at least one value of `expr` is true.</td>
     </tr>
     <tr>
       <td>std(expr)</td>
       <td>Returns the sample standard deviation calculated from values of a group.</td>
     </tr>
     <tr>
       <td>stddev(expr)</td>
       <td>Returns the sample standard deviation calculated from values of a group.</td>
     </tr>
     <tr>
       <td>stddev_pop(expr)</td>
       <td>Returns the population standard deviation calculated from values of a group.</td>
     </tr>
     <tr>
       <td>stddev_samp(expr)</td>
       <td>Returns the sample standard deviation calculated from values of a group.</td>
     </tr>
     <tr>
       <td>sum(expr)</td>
       <td>Returns the sum calculated from values of a group.</td>
     </tr>
     <tr>
       <td>var_pop(expr)</td>
       <td>Returns the population variance calculated from values of a group.</td>
     </tr>
     <tr>
       <td>var_samp(expr)</td>
       <td>Returns the sample variance calculated from values of a group.</td>
     </tr>
     <tr>
       <td>variance(expr)</td>
       <td>Returns the sample variance calculated from values of a group.</td>
     </tr>
   </tbody>
 </table>
	<table class="table">
	<thead>
	<tr>
	<th style="width:25%">Function</th>
	<th>Description</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>any(expr)</td>
	<td>Returns true if at least one value of `expr` is true.</td>
	</tr>
	<tr>
	<td>approx_count_distinct(expr[, relativeSD])</td>
	<td>Returns the estimated cardinality by HyperLogLog++.
	`relativeSD` defines the maximum relative standard deviation allowed.</td>
	</tr>
	<tr>
	<td>approx_percentile(col, percentage [, accuracy])</td>
	<td>Returns the approximate percentile value of numeric
	column `col` at the given percentage. The value of percentage must be between 0.0
	and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which
	controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields
	better accuracy, `1.0/accuracy` is the relative error of the approximation.
	When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
	In this case, returns the approximate percentile array of column `col` at the given
	percentage array.</td>
	</tr>
	<tr>
	<td>avg(expr)</td>
	<td>Returns the mean calculated from values of a group.</td>
	</tr>
	<tr>
	<td>bit_or(expr)</td>
	<td>Returns the bitwise OR of all non-null input values, or null if none.</td>
	</tr>
	<tr>
	<td>bit_xor(expr)</td>
	<td>Returns the bitwise XOR of all non-null input values, or null if none.</td>
	</tr>
	<tr>
	<td>bool_and(expr)</td>
	<td>Returns true if all values of `expr` are true.</td>
	</tr>
	<tr>
	<td>bool_or(expr)</td>
	<td>Returns true if at least one value of `expr` is true.</td>
	</tr>
	<tr>
	<td>collect_list(expr)</td>
	<td>Collects and returns a list of non-unique elements.</td>
	</tr>
	<tr>
	<td>collect_set(expr)</td>
	<td>Collects and returns a set of unique elements.</td>
	</tr>
	<tr>
	<td>corr(expr1, expr2)</td>
	<td>Returns Pearson coefficient of correlation between a set of number pairs.</td>
	</tr>
	<tr>
	<td>count(*)</td>
	<td>Returns the total number of retrieved rows, including rows containing null.</td>
	</tr>
	<tr>
	<td>count(expr[, expr...])</td>
	<td>Returns the number of rows for which the supplied expression(s) are all non-null.</td>
	</tr>
	<tr>
	<td>count(DISTINCT expr[, expr...])</td>
	<td>Returns the number of rows for which the supplied expression(s) are unique and non-null.</td>
	</tr>
	<tr>
	<td>count_if(expr)</td>
	<td>Returns the number of `TRUE` values for the expression.</td>
	</tr>
	<tr>
	<td>count_min_sketch(col, eps, confidence, seed)</td>
	<td>Returns a count-min sketch of a column with the given esp,
	confidence and seed. The result is an array of bytes, which can be deserialized to a
	`CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for
	cardinality estimation using sub-linear space.</td>
	</tr>
	<tr>
	<td>covar_pop(expr1, expr2)</td>
	<td>Returns the population covariance of a set of number pairs.</td>
	</tr>
	<tr>
	<td>covar_samp(expr1, expr2)</td>
	<td>Returns the sample covariance of a set of number pairs.</td>
	</tr>
	<tr>
	<td>every(expr)</td>
	<td>Returns true if all values of `expr` are true.</td>
	</tr>
	<tr>
	<td>first(expr[, isIgnoreNull])</td>
	<td>Returns the first value of `expr` for a group of rows.
	If `isIgnoreNull` is true, returns only non-null values.</td>
	</tr>
	<tr>
	<td>first_value(expr[, isIgnoreNull])</td>
	<td>Returns the first value of `expr` for a group of rows.
	If `isIgnoreNull` is true, returns only non-null values.</td>
	</tr>
	<tr>
	<td>kurtosis(expr)</td>
	<td>Returns the kurtosis value calculated from values of a group.</td>
	</tr>
	<tr>
	<td>last(expr[, isIgnoreNull])</td>
	<td>Returns the last value of `expr` for a group of rows.
	If `isIgnoreNull` is true, returns only non-null values</td>
	</tr>
	<tr>
	<td>last_value(expr[, isIgnoreNull])</td>
	<td>Returns the last value of `expr` for a group of rows.
	If `isIgnoreNull` is true, returns only non-null values</td>
	</tr>
	<tr>
	<td>max(expr)</td>
	<td>Returns the maximum value of `expr`.</td>
	</tr>
	<tr>
	<td>max_by(x, y)</td>
	<td>Returns the value of `x` associated with the maximum value of `y`.</td>
	</tr>
	<tr>
	<td>mean(expr)</td>
	<td>Returns the mean calculated from values of a group.</td>
	</tr>
	<tr>
	<td>min(expr)</td>
	<td>Returns the minimum value of `expr`.</td>
	</tr>
	<tr>
	<td>min_by(x, y)</td>
	<td>Returns the value of `x` associated with the minimum value of `y`.</td>
	</tr>
	<tr>
	<td>percentile(col, percentage [, frequency])</td>
	<td>Returns the exact percentile value of numeric column
	`col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The
	value of frequency should be positive integral</td>
	</tr>
	<tr>
	<td>percentile(col, array(percentage1 [, percentage2]...) [, frequency])</td>
	<td>Returns the exact
	percentile value array of numeric column `col` at the given percentage(s). Each value
	of the percentage array must be between 0.0 and 1.0. The value of frequency should be
	positive integral</td>
	</tr>
	<tr>
	<td>percentile_approx(col, percentage [, accuracy])</td>
	<td>Returns the approximate percentile value of numeric
	column `col` at the given percentage. The value of percentage must be between 0.0
	and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which
	controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields
	better accuracy, `1.0/accuracy` is the relative error of the approximation.
	When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
	In this case, returns the approximate percentile array of column `col` at the given
	percentage array.</td>
	</tr>
	<tr>
	<td>skewness(expr)</td>
	<td>Returns the skewness value calculated from values of a group.</td>
	</tr>
	<tr>
	<td>some(expr)</td>
	<td>Returns true if at least one value of `expr` is true.</td>
	</tr>
	<tr>
	<td>std(expr)</td>
	<td>Returns the sample standard deviation calculated from values of a group.</td>
	</tr>
	<tr>
	<td>stddev(expr)</td>
	<td>Returns the sample standard deviation calculated from values of a group.</td>
	</tr>
	<tr>
	<td>stddev_pop(expr)</td>
	<td>Returns the population standard deviation calculated from values of a group.</td>
	</tr>
	<tr>
	<td>stddev_samp(expr)</td>
	<td>Returns the sample standard deviation calculated from values of a group.</td>
	</tr>
	<tr>
	<td>sum(expr)</td>
	<td>Returns the sum calculated from values of a group.</td>
	</tr>
	<tr>
	<td>var_pop(expr)</td>
	<td>Returns the population variance calculated from values of a group.</td>
	</tr>
	<tr>
	<td>var_samp(expr)</td>
	<td>Returns the sample variance calculated from values of a group.</td>
	</tr>
	<tr>
	<td>variance(expr)</td>
	<td>Returns the sample variance calculated from values of a group.</td>
	</tr>
	</tbody>
	</table>