<?xml version="1.0" encoding="UTF-8"?> | |

<!-- | |

Licensed to the Apache Software Foundation (ASF) under one | |

or more contributor license agreements. See the NOTICE file | |

distributed with this work for additional information | |

regarding copyright ownership. The ASF licenses this file | |

to you under the Apache License, Version 2.0 (the | |

"License"); you may not use this file except in compliance | |

with the License. You may obtain a copy of the License at | |

http://www.apache.org/licenses/LICENSE-2.0 | |

Unless required by applicable law or agreed to in writing, | |

software distributed under the License is distributed on an | |

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |

KIND, either express or implied. See the License for the | |

specific language governing permissions and limitations | |

under the License. | |

--> | |

<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> | |

<concept id="count"> | |

<title>COUNT Function</title> | |

<titlealts audience="PDF"><navtitle>COUNT</navtitle></titlealts> | |

<prolog> | |

<metadata> | |

<data name="Category" value="Impala"/> | |

<data name="Category" value="SQL"/> | |

<data name="Category" value="Impala Functions"/> | |

<data name="Category" value="Analytic Functions"/> | |

<data name="Category" value="Aggregate Functions"/> | |

<data name="Category" value="Querying"/> | |

<data name="Category" value="Developers"/> | |

<data name="Category" value="Data Analysts"/> | |

</metadata> | |

</prolog> | |

<conbody> | |

<p> | |

<indexterm audience="hidden">count() function</indexterm> | |

An aggregate function that returns the number of rows, or the number of non-<codeph>NULL</codeph> rows. | |

</p> | |

<p conref="../shared/impala_common.xml#common/syntax_blurb"/> | |

<codeblock>COUNT([DISTINCT | ALL] <varname>expression</varname>) [OVER (<varname>analytic_clause</varname>)]</codeblock> | |

<p> | |

Depending on the argument, <codeph>COUNT()</codeph> considers rows that meet certain conditions: | |

</p> | |

<ul> | |

<li> | |

The notation <codeph>COUNT(*)</codeph> includes <codeph>NULL</codeph> values in the total. | |

</li> | |

<li> | |

The notation <codeph>COUNT(<varname>column_name</varname>)</codeph> only considers rows where the column | |

contains a non-<codeph>NULL</codeph> value. | |

</li> | |

<li> | |

You can also combine <codeph>COUNT</codeph> with the <codeph>DISTINCT</codeph> operator to eliminate | |

duplicates before counting, and to count the combinations of values across multiple columns. | |

</li> | |

</ul> | |

<p> | |

When the query contains a <codeph>GROUP BY</codeph> clause, returns one value for each combination of | |

grouping values. | |

</p> | |

<p> | |

<b>Return type:</b> <codeph>BIGINT</codeph> | |

</p> | |

<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> | |

<p conref="../shared/impala_common.xml#common/partition_key_optimization"/> | |

<p conref="../shared/impala_common.xml#common/complex_types_blurb"/> | |

<p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> | |

<p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> | |

<p conref="../shared/impala_common.xml#common/example_blurb"/> | |

<codeblock>-- How many rows total are in the table, regardless of NULL values? | |

select count(*) from t1; | |

-- How many rows are in the table with non-NULL values for a column? | |

select count(c1) from t1; | |

-- Count the rows that meet certain conditions. | |

-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col). | |

select count(*) from t1 where x > 10; | |

select count(c1) from t1 where x > 10; | |

-- Can also be used in combination with DISTINCT and/or GROUP BY. | |

-- Combine COUNT and DISTINCT to find the number of unique values. | |

-- Must use column names rather than * with COUNT(DISTINCT ...) syntax. | |

-- Rows with NULL values are not counted. | |

select count(distinct c1) from t1; | |

-- Rows with a NULL value in _either_ column are not counted. | |

select count(distinct c1, c2) from t1; | |

-- Return more than one result. | |

select month, year, count(distinct visitor_id) from web_stats group by month, year; | |

</codeblock> | |

<p rev="2.0.0"> | |

The following examples show how to use <codeph>COUNT()</codeph> in an analytic context. They use a table | |

containing integers from 1 to 10. Notice how the <codeph>COUNT()</codeph> is reported for each input value, as | |

opposed to the <codeph>GROUP BY</codeph> clause which condenses the result set. | |

<codeblock>select x, property, count(x) over (partition by property) as count from int_t where property in ('odd','even'); | |

+----+----------+-------+ | |

| x | property | count | | |

+----+----------+-------+ | |

| 2 | even | 5 | | |

| 4 | even | 5 | | |

| 6 | even | 5 | | |

| 8 | even | 5 | | |

| 10 | even | 5 | | |

| 1 | odd | 5 | | |

| 3 | odd | 5 | | |

| 5 | odd | 5 | | |

| 7 | odd | 5 | | |

| 9 | odd | 5 | | |

+----+----------+-------+ | |

</codeblock> | |

Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results that are cumulative or apply to a moving | |

set of rows (the <q>window</q>). The following examples use <codeph>COUNT()</codeph> in an analytic context | |

(that is, with an <codeph>OVER()</codeph> clause) to produce a running count of all the even values, | |

then a running count of all the odd values. The basic <codeph>ORDER BY x</codeph> clause implicitly | |

activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, | |

which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, | |

therefore all of these examples produce the same results: | |

<codeblock>select x, property, | |

count(x) over (partition by property <b>order by x</b>) as 'cumulative count' | |

from int_t where property in ('odd','even'); | |

+----+----------+------------------+ | |

| x | property | cumulative count | | |

+----+----------+------------------+ | |

| 2 | even | 1 | | |

| 4 | even | 2 | | |

| 6 | even | 3 | | |

| 8 | even | 4 | | |

| 10 | even | 5 | | |

| 1 | odd | 1 | | |

| 3 | odd | 2 | | |

| 5 | odd | 3 | | |

| 7 | odd | 4 | | |

| 9 | odd | 5 | | |

+----+----------+------------------+ | |

select x, property, | |

count(x) over | |

( | |

partition by property | |

<b>order by x</b> | |

<b>range between unbounded preceding and current row</b> | |

) as 'cumulative total' | |

from int_t where property in ('odd','even'); | |

+----+----------+------------------+ | |

| x | property | cumulative count | | |

+----+----------+------------------+ | |

| 2 | even | 1 | | |

| 4 | even | 2 | | |

| 6 | even | 3 | | |

| 8 | even | 4 | | |

| 10 | even | 5 | | |

| 1 | odd | 1 | | |

| 3 | odd | 2 | | |

| 5 | odd | 3 | | |

| 7 | odd | 4 | | |

| 9 | odd | 5 | | |

+----+----------+------------------+ | |

select x, property, | |

count(x) over | |

( | |

partition by property | |

<b>order by x</b> | |

<b>rows between unbounded preceding and current row</b> | |

) as 'cumulative total' | |

from int_t where property in ('odd','even'); | |

+----+----------+------------------+ | |

| x | property | cumulative count | | |

+----+----------+------------------+ | |

| 2 | even | 1 | | |

| 4 | even | 2 | | |

| 6 | even | 3 | | |

| 8 | even | 4 | | |

| 10 | even | 5 | | |

| 1 | odd | 1 | | |

| 3 | odd | 2 | | |

| 5 | odd | 3 | | |

| 7 | odd | 4 | | |

| 9 | odd | 5 | | |

+----+----------+------------------+ | |

</codeblock> | |

The following examples show how to construct a moving window, with a running count taking into account 1 row before | |

and 1 row after the current row, within the same partition (all the even values or all the odd values). | |

Therefore, the count is consistently 3 for rows in the middle of the window, and 2 for | |

rows near the ends of the window, where there is no preceding or no following row in the partition. | |

Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this type of | |

moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but not the <codeph>RANGE BETWEEN</codeph> | |

clause: | |

<codeblock>select x, property, | |

count(x) over | |

( | |

partition by property | |

<b>order by x</b> | |

<b>rows between 1 preceding and 1 following</b> | |

) as 'moving total' | |

from int_t where property in ('odd','even'); | |

+----+----------+--------------+ | |

| x | property | moving total | | |

+----+----------+--------------+ | |

| 2 | even | 2 | | |

| 4 | even | 3 | | |

| 6 | even | 3 | | |

| 8 | even | 3 | | |

| 10 | even | 2 | | |

| 1 | odd | 2 | | |

| 3 | odd | 3 | | |

| 5 | odd | 3 | | |

| 7 | odd | 3 | | |

| 9 | odd | 2 | | |

+----+----------+--------------+ | |

-- Doesn't work because of syntax restriction on RANGE clause. | |

select x, property, | |

count(x) over | |

( | |

partition by property | |

<b>order by x</b> | |

<b>range between 1 preceding and 1 following</b> | |

) as 'moving total' | |

from int_t where property in ('odd','even'); | |

ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. | |

</codeblock> | |

</p> | |

<p conref="../shared/impala_common.xml#common/related_info"/> | |

<p> | |

<xref href="impala_analytic_functions.xml#analytic_functions"/> | |

</p> | |

</conbody> | |

</concept> |