Aggregate functions operate on a set of values to compute a single result.
Aggregate functions support the SQL FILTER (WHERE ...) clause to restrict which input rows contribute to the aggregate result.
function([exprs]) FILTER (WHERE condition)
Example:
SELECT sum(salary) FILTER (WHERE salary > 0) AS sum_positive_salaries, count(*) FILTER (WHERE active) AS active_count FROM employees;
Note: When no rows pass the filter, COUNT returns 0 while SUM/AVG/MIN/MAX return NULL.
array_aggReturns an array created from the expression elements. If ordering is required, elements are inserted in the specified order. This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the argument expression.
array_agg(expression [ORDER BY expression])
> SELECT array_agg(column_name ORDER BY other_column) FROM table_name; +-----------------------------------------------+ | array_agg(column_name ORDER BY other_column) | +-----------------------------------------------+ | [element1, element2, element3] | +-----------------------------------------------+ > SELECT array_agg(DISTINCT column_name ORDER BY column_name) FROM table_name; +--------------------------------------------------------+ | array_agg(DISTINCT column_name ORDER BY column_name) | +--------------------------------------------------------+ | [element1, element2, element3] | +--------------------------------------------------------+
avgReturns the average of numeric values in the specified column.
avg(expression)
> SELECT avg(column_name) FROM table_name; +---------------------------+ | avg(column_name) | +---------------------------+ | 42.75 | +---------------------------+
bit_andComputes the bitwise AND of all non-null input values.
bit_and(expression)
bit_orComputes the bitwise OR of all non-null input values.
bit_or(expression)
bit_xorComputes the bitwise exclusive OR of all non-null input values.
bit_xor(expression)
bool_andReturns true if all non-null input values are true, otherwise false.
bool_and(expression)
> SELECT bool_and(column_name) FROM table_name; +----------------------------+ | bool_and(column_name) | +----------------------------+ | true | +----------------------------+
bool_orReturns true if all non-null input values are true, otherwise false.
bool_and(expression)
> SELECT bool_and(column_name) FROM table_name; +----------------------------+ | bool_and(column_name) | +----------------------------+ | true | +----------------------------+
countReturns the number of non-null values in the specified column. To include null values in the total count, use count(*).
count(expression)
> SELECT count(column_name) FROM table_name; +-----------------------+ | count(column_name) | +-----------------------+ | 100 | +-----------------------+ > SELECT count(*) FROM table_name; +------------------+ | count(*) | +------------------+ | 120 | +------------------+
first_valueReturns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
first_value(expression [ORDER BY expression])
> SELECT first_value(column_name ORDER BY other_column) FROM table_name; +-----------------------------------------------+ | first_value(column_name ORDER BY other_column)| +-----------------------------------------------+ | first_element | +-----------------------------------------------+
groupingReturns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.
grouping(expression)
> SELECT column_name, GROUPING(column_name) AS group_column FROM table_name GROUP BY GROUPING SETS ((column_name), ()); +-------------+-------------+ | column_name | group_column | +-------------+-------------+ | value1 | 0 | | value2 | 0 | | NULL | 1 | +-------------+-------------+
last_valueReturns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
last_value(expression [ORDER BY expression])
> SELECT last_value(column_name ORDER BY other_column) FROM table_name; +-----------------------------------------------+ | last_value(column_name ORDER BY other_column) | +-----------------------------------------------+ | last_element | +-----------------------------------------------+
maxReturns the maximum value in the specified column.
max(expression)
> SELECT max(column_name) FROM table_name; +----------------------+ | max(column_name) | +----------------------+ | 150 | +----------------------+
meanAlias of avg.
medianReturns the median value in the specified column.
median(expression)
> SELECT median(column_name) FROM table_name; +----------------------+ | median(column_name) | +----------------------+ | 45.5 | +----------------------+
minReturns the minimum value in the specified column.
min(expression)
> SELECT min(column_name) FROM table_name; +----------------------+ | min(column_name) | +----------------------+ | 12 | +----------------------+
percentile_contReturns the exact percentile of input values, interpolating between values if needed.
percentile_cont(percentile) WITHIN GROUP (ORDER BY expression)
> SELECT percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) FROM table_name; +----------------------------------------------------------+ | percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) | +----------------------------------------------------------+ | 45.5 | +----------------------------------------------------------+
An alternate syntax is also supported:
> SELECT percentile_cont(column_name, 0.75) FROM table_name; +---------------------------------------+ | percentile_cont(column_name, 0.75) | +---------------------------------------+ | 45.5 | +---------------------------------------+
quantile_contAlias of percentile_cont.
string_aggConcatenates the values of string expressions and places separator values between them. If ordering is required, strings are concatenated in the specified order. This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the first argument expression.
string_agg([DISTINCT] expression, delimiter [ORDER BY expression])
> SELECT string_agg(name, ', ') AS names_list FROM employee; +--------------------------+ | names_list | +--------------------------+ | Alice, Bob, Bob, Charlie | +--------------------------+ > SELECT string_agg(name, ', ' ORDER BY name DESC) AS names_list FROM employee; +--------------------------+ | names_list | +--------------------------+ | Charlie, Bob, Bob, Alice | +--------------------------+ > SELECT string_agg(DISTINCT name, ', ' ORDER BY name DESC) AS names_list FROM employee; +--------------------------+ | names_list | +--------------------------+ | Charlie, Bob, Alice | +--------------------------+
sumReturns the sum of all values in the specified column.
sum(expression)
> SELECT sum(column_name) FROM table_name; +-----------------------+ | sum(column_name) | +-----------------------+ | 12345 | +-----------------------+
varReturns the statistical sample variance of a set of numbers.
var(expression)
var_popReturns the statistical population variance of a set of numbers.
var_pop(expression)
var_populationAlias of var_pop.
var_sampAlias of var.
var_sampleAlias of var.
corrReturns the coefficient of correlation between two numeric values.
corr(expression1, expression2)
> SELECT corr(column1, column2) FROM table_name; +--------------------------------+ | corr(column1, column2) | +--------------------------------+ | 0.85 | +--------------------------------+
covarAlias of covar_samp.
covar_popReturns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
> SELECT covar_samp(column1, column2) FROM table_name; +-----------------------------------+ | covar_samp(column1, column2) | +-----------------------------------+ | 8.25 | +-----------------------------------+
covar_sampReturns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
> SELECT covar_samp(column1, column2) FROM table_name; +-----------------------------------+ | covar_samp(column1, column2) | +-----------------------------------+ | 8.25 | +-----------------------------------+
nth_valueReturns the nth value in a group of values.
nth_value(expression, n ORDER BY expression)
> SELECT dept_id, salary, NTH_VALUE(salary, 2) OVER (PARTITION BY dept_id ORDER BY salary ASC) AS second_salary_by_dept FROM employee; +---------+--------+-------------------------+ | dept_id | salary | second_salary_by_dept | +---------+--------+-------------------------+ | 1 | 30000 | NULL | | 1 | 40000 | 40000 | | 1 | 50000 | 40000 | | 2 | 35000 | NULL | | 2 | 45000 | 45000 | +---------+--------+-------------------------+
regr_avgxComputes the average of the independent variable (input) expression_x for the non-null paired data points.
regr_avgx(expression_y, expression_x)
create table daily_sales(day int, total_sales int) as values (1,100), (2,150), (3,200), (4,NULL), (5,250); select * from daily_sales; +-----+-------------+ | day | total_sales | | --- | ----------- | | 1 | 100 | | 2 | 150 | | 3 | 200 | | 4 | NULL | | 5 | 250 | +-----+-------------+ SELECT regr_avgx(total_sales, day) AS avg_day FROM daily_sales; +----------+ | avg_day | +----------+ | 2.75 | +----------+
regr_avgyComputes the average of the dependent variable (output) expression_y for the non-null paired data points.
regr_avgy(expression_y, expression_x)
create table daily_temperature(day int, temperature int) as values (1,30), (2,32), (3, NULL), (4,35), (5,36); select * from daily_temperature; +-----+-------------+ | day | temperature | | --- | ----------- | | 1 | 30 | | 2 | 32 | | 3 | NULL | | 4 | 35 | | 5 | 36 | +-----+-------------+ -- temperature as Dependent Variable(Y), day as Independent Variable(X) SELECT regr_avgy(temperature, day) AS avg_temperature FROM daily_temperature; +-----------------+ | avg_temperature | +-----------------+ | 33.25 | +-----------------+
regr_countCounts the number of non-null paired data points.
regr_count(expression_y, expression_x)
create table daily_metrics(day int, user_signups int) as values (1,100), (2,120), (3, NULL), (4,110), (5,NULL); select * from daily_metrics; +-----+---------------+ | day | user_signups | | --- | ------------- | | 1 | 100 | | 2 | 120 | | 3 | NULL | | 4 | 110 | | 5 | NULL | +-----+---------------+ SELECT regr_count(user_signups, day) AS valid_pairs FROM daily_metrics; +-------------+ | valid_pairs | +-------------+ | 3 | +-------------+
regr_interceptComputes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
regr_intercept(expression_y, expression_x)
create table weekly_performance(week int, productivity_score int) as values (1,60), (2,65), (3, 70), (4,75), (5,80); select * from weekly_performance; +------+---------------------+ | week | productivity_score | | ---- | ------------------- | | 1 | 60 | | 2 | 65 | | 3 | 70 | | 4 | 75 | | 5 | 80 | +------+---------------------+ SELECT regr_intercept(productivity_score, week) AS intercept FROM weekly_performance; +----------+ |intercept| |intercept | +----------+ | 55 | +----------+
regr_r2Computes the square of the correlation coefficient between the independent and dependent variables.
regr_r2(expression_y, expression_x)
create table weekly_performance(day int ,user_signups int) as values (1,60), (2,65), (3, 70), (4,75), (5,80); select * from weekly_performance; +-----+--------------+ | day | user_signups | +-----+--------------+ | 1 | 60 | | 2 | 65 | | 3 | 70 | | 4 | 75 | | 5 | 80 | +-----+--------------+ SELECT regr_r2(user_signups, day) AS r_squared FROM weekly_performance; +---------+ |r_squared| +---------+ | 1.0 | +---------+
regr_slopeReturns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.
regr_slope(expression_y, expression_x)
create table weekly_performance(day int, user_signups int) as values (1,60), (2,65), (3, 70), (4,75), (5,80); select * from weekly_performance; +-----+--------------+ | day | user_signups | +-----+--------------+ | 1 | 60 | | 2 | 65 | | 3 | 70 | | 4 | 75 | | 5 | 80 | +-----+--------------+ SELECT regr_slope(user_signups, day) AS slope FROM weekly_performance; +--------+ | slope | +--------+ | 5.0 | +--------+
regr_sxxComputes the sum of squares of the independent variable.
regr_sxx(expression_y, expression_x)
create table study_hours(student_id int, hours int, test_score int) as values (1,2,55), (2,4,65), (3,6,75), (4,8,85), (5,10,95); select * from study_hours; +------------+-------+------------+ | student_id | hours | test_score | +------------+-------+------------+ | 1 | 2 | 55 | | 2 | 4 | 65 | | 3 | 6 | 75 | | 4 | 8 | 85 | | 5 | 10 | 95 | +------------+-------+------------+ SELECT regr_sxx(test_score, hours) AS sxx FROM study_hours; +------+ | sxx | +------+ | 40.0 | +------+
regr_sxyComputes the sum of products of paired data points.
regr_sxy(expression_y, expression_x)
create table employee_productivity(week int, productivity_score int) as values(1,60), (2,65), (3,70); select * from employee_productivity; +------+--------------------+ | week | productivity_score | +------+--------------------+ | 1 | 60 | | 2 | 65 | | 3 | 70 | +------+--------------------+ SELECT regr_sxy(productivity_score, week) AS sum_product_deviations FROM employee_productivity; +------------------------+ | sum_product_deviations | +------------------------+ | 10.0 | +------------------------+
regr_syyComputes the sum of squares of the dependent variable.
regr_syy(expression_y, expression_x)
create table employee_productivity(week int, productivity_score int) as values (1,60), (2,65), (3,70); select * from employee_productivity; +------+--------------------+ | week | productivity_score | +------+--------------------+ | 1 | 60 | | 2 | 65 | | 3 | 70 | +------+--------------------+ SELECT regr_syy(productivity_score, week) AS sum_squares_y FROM employee_productivity; +---------------+ | sum_squares_y | +---------------+ | 50.0 | +---------------+
stddevReturns the standard deviation of a set of numbers.
stddev(expression)
> SELECT stddev(column_name) FROM table_name; +----------------------+ | stddev(column_name) | +----------------------+ | 12.34 | +----------------------+
stddev_popReturns the population standard deviation of a set of numbers.
stddev_pop(expression)
> SELECT stddev_pop(column_name) FROM table_name; +--------------------------+ | stddev_pop(column_name) | +--------------------------+ | 10.56 | +--------------------------+
stddev_sampAlias of stddev.
approx_distinctReturns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
approx_distinct(expression)
> SELECT approx_distinct(column_name) FROM table_name; +-----------------------------------+ | approx_distinct(column_name) | +-----------------------------------+ | 42 | +-----------------------------------+
approx_medianReturns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(0.5) WITHIN GROUP (ORDER BY x).
approx_median(expression)
> SELECT approx_median(column_name) FROM table_name; +-----------------------------------+ | approx_median(column_name) | +-----------------------------------+ | 23.5 | +-----------------------------------+
approx_percentile_contReturns the approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont(percentile [, centroids]) WITHIN GROUP (ORDER BY expression)
> SELECT approx_percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) FROM table_name; +------------------------------------------------------------------+ | approx_percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) | +------------------------------------------------------------------+ | 65.0 | +------------------------------------------------------------------+ > SELECT approx_percentile_cont(0.75, 100) WITHIN GROUP (ORDER BY column_name) FROM table_name; +-----------------------------------------------------------------------+ | approx_percentile_cont(0.75, 100) WITHIN GROUP (ORDER BY column_name) | +-----------------------------------------------------------------------+ | 65.0 | +-----------------------------------------------------------------------+
An alternate syntax is also supported:
> SELECT approx_percentile_cont(column_name, 0.75) FROM table_name; +-----------------------------------------------+ | approx_percentile_cont(column_name, 0.75) | +-----------------------------------------------+ | 65.0 | +-----------------------------------------------+ > SELECT approx_percentile_cont(column_name, 0.75, 100) FROM table_name; +----------------------------------------------------------+ | approx_percentile_cont(column_name, 0.75, 100) | +----------------------------------------------------------+ | 65.0 | +----------------------------------------------------------+
approx_percentile_cont_with_weightReturns the weighted approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont_with_weight(weight, percentile [, centroids]) WITHIN GROUP (ORDER BY expression)
> SELECT approx_percentile_cont_with_weight(weight_column, 0.90) WITHIN GROUP (ORDER BY column_name) FROM table_name; +---------------------------------------------------------------------------------------------+ | approx_percentile_cont_with_weight(weight_column, 0.90) WITHIN GROUP (ORDER BY column_name) | +---------------------------------------------------------------------------------------------+ | 78.5 | +---------------------------------------------------------------------------------------------+ > SELECT approx_percentile_cont_with_weight(weight_column, 0.90, 100) WITHIN GROUP (ORDER BY column_name) FROM table_name; +--------------------------------------------------------------------------------------------------+ | approx_percentile_cont_with_weight(weight_column, 0.90, 100) WITHIN GROUP (ORDER BY column_name) | +--------------------------------------------------------------------------------------------------+ | 78.5 | +--------------------------------------------------------------------------------------------------+
An alternative syntax is also supported:
> SELECT approx_percentile_cont_with_weight(column_name, weight_column, 0.90) FROM table_name; +--------------------------------------------------+ | approx_percentile_cont_with_weight(column_name, weight_column, 0.90) | +--------------------------------------------------+ | 78.5 | +--------------------------------------------------+