VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP

variance() function variance_samp() function variance_pop() function var_samp() function var_pop() function An aggregate function that returns the variance of a set of numbers. This is a mathematical property that signifies how far the values spread apart from the mean. The return value can be zero (if the input is a single value, or a set of identical values), or a positive number otherwise.

{ VARIANCE | VAR[IANCE]_SAMP | VAR[IANCE]_POP } ([DISTINCT | ALL] expression)

This function works with any numeric data type.

This function is typically used in mathematical formulas related to probability distributions.

The VARIANCE_SAMP() and VARIANCE_POP() functions compute the sample variance and population variance, respectively, of the input values. (VARIANCE() is an alias for VARIANCE_SAMP().) Both functions evaluate all input rows matched by the query. The difference is that STDDEV_SAMP() is scaled by 1/(N-1) while STDDEV_POP() is scaled by 1/N.

The functions VAR_SAMP() and VAR_POP() are the same as VARIANCE_SAMP() and VARIANCE_POP(), respectively. These aliases are available in Impala 2.0 and later.

If no input rows match the query, the result of any of these functions is NULL. If a single input row matches the query, the result of any of these functions is "0.0".

This example demonstrates how VARIANCE() and VARIANCE_SAMP() return the same result, while VARIANCE_POP() uses a slightly different calculation to reflect that the input data is considered part of a larger population.

[localhost:21000] > select variance(score) from test_scores; +-----------------+ | variance(score) | +-----------------+ | 812.25 | +-----------------+ [localhost:21000] > select variance_samp(score) from test_scores; +----------------------+ | variance_samp(score) | +----------------------+ | 812.25 | +----------------------+ [localhost:21000] > select variance_pop(score) from test_scores; +---------------------+ | variance_pop(score) | +---------------------+ | 811.438 | +---------------------+

This example demonstrates that, because the return value of these aggregate functions is a STRING, you convert the result with CAST if you need to do further calculations as a numeric value.

[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores; +-------------------+ | summary | +-------------------+ | Inserted 1 row(s) | +-------------------+ [localhost:21000] > desc score_stats; +--------------------+--------------+---------+ | name | type | comment | +--------------------+--------------+---------+ | standard_deviation | decimal(7,4) | | | variance | decimal(7,4) | | +--------------------+--------------+---------+

The STDDEV(), STDDEV_POP(), and STDDEV_SAMP() functions compute the standard deviation (square root of the variance) based on the results of VARIANCE(), VARIANCE_POP(), and VARIANCE_SAMP() respectively. See for details about the standard deviation property.