geode-docs/developing/query_select/aggregates.html.md.erb - geode - Git at Google

 ---
 title:  OQL Aggregate Functions
 ---

 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 The aggregate functions
 `MIN`, `MAX`,
 `COUNT`, `COUNT` over a `DISTINCT` expression,
 `SUM`, `SUM` over a `DISTINCT` expression,
 `AVG`, `AVG` over a `DISTINCT` expression, and
 `GROUP BY` are supported.

 The following sections provide descriptions and limitations (if any) about the aggregate functions, how do they work and how they can be used.

 ## <a id="groupBy"></a>GROUP BY
 You can collect data across multiple entries and group the results by one or more columns through the usage of the `GROUP BY` statement.
 It's important to note some facts about its usage:

 * It can group results by one or more fields.
 * It returns a single record / entry per group.
 * It must always be placed before the `ORDER BY` clause, if any.
 * It can be used in conjunction with other aggregate functions: `MIN`, `MAX`, `COUNT`, `SUM` and `AVG`.
 * It groups records using the selected fields if and only if the fields have identical data across entries.
 * It is required, whenever an aggregate function is used within a query with other selected fields, to also use `GROUP BY`.
 * If there are no other aggregate functions within the query, all fields included within a GROUP BY clause must also be part of the original projection list, and all fields included within the projection list must also be part of the `GROUP BY` clause.

 The following are example `GROUP BY` queries.

 The following `GROUP BY` query returns the maximum amount of sales per employee.

 ``` pre
 SELECT ID, MAX(e.sales)
 FROM /employees e
 GROUP BY ID
 ```

 The following `GROUP BY` query returns the minimum, maximum, total count, average and summation of IDs grouped by status.

 ``` pre
 SELECT pf.status, MIN(pf.ID), MAX(pf.ID), COUNT(pf.ID), AVG(pf.ID), SUM(pf.ID)
 FROM /portfolio pf
 GROUP BY pf.status
 ```

 ## <a id="min"></a>MIN

 The `MIN` keyword returns the minimum or smallest value from the selected expression. The expression itself must always evaluate to `java.lang.Comparable`. The `MIN` statement returns the actual type of the selected element as its result.

 The following are example `MIN` queries that return region entries (the entries implement the `java.lang.Comparable` interface).

 ``` pre
 SELECT MIN(pf)
 FROM /exampleRegion pf
 ```

 ``` pre
 SELECT MIN(pf)
 FROM /exampleRegion
 pf WHERE pf.ID > 0
 ```

 ``` pre
 SELECT MIN(pf)
 FROM /exampleRegion pf
 WHERE pf.ID > 10 LIMIT 50
 ```

 ``` pre
 SELECT MIN(pf)
 FROM /exampleRegion pf
 WHERE pf.ID > 0 AND pf.status LIKE 'act%'
 ```

 The following `MIN` query returns the lowest entry ID that matches the query's selection criteria.

 ``` pre
 SELECT MIN(pf.ID)
 FROM /exampleRegion pf, pf.positions.values pos
 WHERE pf.ID > 0 AND pos.secId = 'IBM'
 ```

 The following `MIN` query returns the lowest positive ID grouped by status.

 ``` pre
 SELECT pf.status, MIN(pf.ID)
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ```

 ## <a id="max"></a>MAX

 The `MAX` keyword returns the maximum or highest value from the selected expression. The expression itself must always evaluate to `java.lang.Comparable`. The `MAX` statement returns the type of the selected element as its result.

 The following are example `MAX` queries that return region entries (the entries implement the `java.lang.Comparable` interface).

 ``` pre
 SELECT MAX(pf)
 FROM /exampleRegion pf
 ```

 ``` pre
 SELECT MAX(pf)
 FROM /exampleRegion
 pf WHERE pf.ID > 0
 ```

 ``` pre
 SELECT MAX(pf)
 FROM /exampleRegion pf
 WHERE pf.ID > 10 LIMIT 50
 ```

 ``` pre
 SELECT MAX(pf)
 FROM /exampleRegion pf
 WHERE pf.ID > 0 AND pf.status LIKE 'act%'
 ```

 The following `MAX` query returns the highest entry ID that matches the query's selection criteria.

 ``` pre
 SELECT MAX(pf.ID)
 FROM /exampleRegion pf, pf.positions.values pos
 WHERE pf.ID > 0 AND pos.secId = 'IBM'
 ```

 The following `MAX` query returns the highest positive IDs grouped by status.

 ``` pre
 SELECT pf.status, MAX(pf.ID)
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ```

 ## <a id="count"></a>COUNT

 The `COUNT` keyword returns the number of results that match the query selection conditions specified in the `WHERE` clause. Using `COUNT` allows you to determine the size of a results set.
 The `COUNT` statement always returns a `java.lang.Integer` or `java.lang.Long` as the result (depending on how big the value is); you should take this into consideration when executing the query: if an overflow occurs while computing the `COUNT` function because the value is higher than `Long.MAX_VALUE` (2<sup>63</sup> - 1), the result will be incorrect.

 The following queries are example `COUNT` queries that return region entries:

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion
 ```

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion
 WHERE ID > 0
 ```

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion
 WHERE ID > 0 LIMIT 50
 ```

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion
 WHERE ID > 0 AND status LIKE 'act%'
 ```

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion
 WHERE ID IN SET(1,2,3,4,5)
 ```

 The following `COUNT` query returns the total number of StructTypes that match the query's selection criteria.

 ``` pre
 SELECT COUNT(*)
 FROM /exampleRegion p, p.positions.values pos
 WHERE p.ID > 0 AND pos.secId 'IBM'
 ```

 The following `COUNT` query uses the `DISTINCT` keyword and eliminates duplicates from the number of results.

 ``` pre
 SELECT DISTINCT COUNT(*)
 FROM /exampleRegion p, p.positions.values pos
 WHERE p.ID > 0 OR p.status = 'active' OR pos.secId = 'IBM'
 ```

 ## <a id="sum"></a>SUM

 The `SUM` keyword returns the summation of all results that match the query selection conditions specified in the `WHERE` clause. Using `SUM` allows you to aggregate specific numeric values within a results set.
 For partitioned regions, each node's buckets compute a sum over that node and return the result to the coordinator node executing the query, which then aggregates the sums across all nodes.

 The `SUM` function where the `DISTINCT` modifier is applied to the expression returns the summation over the set of unique (distinct) values.
 For partitioned regions, the distinct values in a node's buckets are returned to the coordinator node, which can then calculate the sum over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

 The actual expression used to calculate the aggregation should be an instance of `java.lang.Number`.
 The `SUM` statement always returns a `java.lang.Number` as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of `java.lang.Integer`, `java.lang.Long`, `java.lang.Float` or `java.lang.Double`; you should take this into consideration when executing the query: if an overflow occurs while computing the `SUM` function because the value is higher than `Double.MAX_VALUE` ((2 - 2<sup>-52</sup>) * 2<sup>1023</sup>), the result will be incorrect.

 The following are example `SUM` queries that return the summation of the entries ID.

 ``` pre
 SELECT SUM(ID)
 FROM /exampleRegion
 ```

 ``` pre
 SELECT SUM(ID)
 FROM /exampleRegion
 WHERE ID > 0
 ```

 ``` pre
 SELECT SUM(ID)
 FROM /exampleRegion
 WHERE ID > 0 LIMIT 50
 ```

 ``` pre
 SELECT SUM(ID)
 FROM /exampleRegion
 WHERE ID > 0 AND status LIKE 'act%'
 ```

 ``` pre
 SELECT SUM(ID)
 FROM /exampleRegion
 WHERE ID IN SET(1,2,3,4,5)
 ```

 The following `SUM` query returns the total summation of positive IDs grouped by status.

 ``` pre
 SELECT pf.status, SUM(pf.ID)
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ```

 The following `SUM` query uses the DISTINCT keyword and eliminates duplicates from the aggregation.

 ``` pre
 SELECT SUM(DISTINCT pf.ID)
 FROM /exampleRegion pf, pf.positions.values pos
 WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'
 ```

 The following `SUM` query returns the total aggregation of positive IDs grouped by status and sorted by the aggregation result in descending order.

 ``` pre
 SELECT pf.status, SUM(pf.ID) as sm
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ORDER BY sm DESC
 ```

 ## <a id="avg"></a>AVG

 The `AVG` keyword returns the arithmetic mean of the set formed by the selected expression.
 For partitioned regions, each node's buckets provide both a sum and the number of elements to the node executing the query (coordinator), such that a correct average may be computed.

 The `AVG` keyword where the `DISTINCT` modifier is applied to the expression returns the arithmetic mean of the set of unique (distinct) values.
 For partitioned regions, the distinct values in a node's buckets are returned to the coordinator node, which can then calculate the average over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

 The actual expression used to calculate the aggregation should be an instance of `java.lang.Number`.
 The `AVG` statement always returns a `java.lang.Number` as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of `java.lang.Integer`, `java.lang.Long`, `java.lang.Float` or `java.lang.Double`; you should take this into consideration when executing the query: if an overflow occurs while computing the `AVG` function because the value is higher than `Double.MAX_VALUE` ((2 - 2<sup>-52</sup>) * 2<sup>1023</sup>), or if an overflow occurs while computing the intermediate count because the amount of elements is higher than `Long.MAX_VALUE` (2<sup>63</sup> - 1), the result will be incorrect.

 The following are example `AVG` queries that calculate the average of the entries ID.

 ``` pre
 SELECT AVG(ID)
 FROM /exampleRegion
 ```

 ``` pre
 SELECT AVG(ID)
 FROM /exampleRegion
 WHERE ID > 0
 ```

 ``` pre
 SELECT AVG(ID)
 FROM /exampleRegion
 WHERE ID > 0 LIMIT 50
 ```

 ``` pre
 SELECT AVG(ID)
 FROM /exampleRegion
 WHERE ID > 0 AND status LIKE 'act%'
 ```

 ``` pre
 SELECT AVG(ID)
 FROM /exampleRegion
 WHERE ID IN SET(1,2,3,4,5)
 ```

 The following `AVG` query returns the average of positive IDs grouped by status.

 ``` pre
 SELECT pf.status, AVG(pf.ID)
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ```

 The following `AVG` query uses the `DISTINCT` keyword and eliminates duplicates from the aggregation.

 ``` pre
 SELECT AVG(DISTINCT pf.ID)
 FROM /exampleRegion pf, pf.positions.values pos
 WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'
 ```

 The following `AVG` query returns the average of positive IDs grouped by status and sorted by the calculation result in descending order.

 ``` pre
 SELECT pf.status, AVG(pf.ID) as sm
 FROM /exampleRegion pf
 WHERE pf.ID > 0
 GROUP BY pf.status
 ORDER BY sm DESC
 ```
	---
	title: OQL Aggregate Functions
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	The aggregate functions
	`MIN`, `MAX`,
	`COUNT`, `COUNT` over a `DISTINCT` expression,
	`SUM`, `SUM` over a `DISTINCT` expression,
	`AVG`, `AVG` over a `DISTINCT` expression, and
	`GROUP BY` are supported.

	The following sections provide descriptions and limitations (if any) about the aggregate functions, how do they work and how they can be used.

	## <a id="groupBy"></a>GROUP BY
	You can collect data across multiple entries and group the results by one or more columns through the usage of the `GROUP BY` statement.
	It's important to note some facts about its usage:

	* It can group results by one or more fields.
	* It returns a single record / entry per group.
	* It must always be placed before the `ORDER BY` clause, if any.
	* It can be used in conjunction with other aggregate functions: `MIN`, `MAX`, `COUNT`, `SUM` and `AVG`.
	* It groups records using the selected fields if and only if the fields have identical data across entries.
	* It is required, whenever an aggregate function is used within a query with other selected fields, to also use `GROUP BY`.
	* If there are no other aggregate functions within the query, all fields included within a GROUP BY clause must also be part of the original projection list, and all fields included within the projection list must also be part of the `GROUP BY` clause.

	The following are example `GROUP BY` queries.

	The following `GROUP BY` query returns the maximum amount of sales per employee.

	``` pre
	SELECT ID, MAX(e.sales)
	FROM /employees e
	GROUP BY ID
	```

	The following `GROUP BY` query returns the minimum, maximum, total count, average and summation of IDs grouped by status.

	``` pre
	SELECT pf.status, MIN(pf.ID), MAX(pf.ID), COUNT(pf.ID), AVG(pf.ID), SUM(pf.ID)
	FROM /portfolio pf
	GROUP BY pf.status
	```

	## <a id="min"></a>MIN

	The `MIN` keyword returns the minimum or smallest value from the selected expression. The expression itself must always evaluate to `java.lang.Comparable`. The `MIN` statement returns the actual type of the selected element as its result.

	The following are example `MIN` queries that return region entries (the entries implement the `java.lang.Comparable` interface).

	``` pre
	SELECT MIN(pf)
	FROM /exampleRegion pf
	```

	``` pre
	SELECT MIN(pf)
	FROM /exampleRegion
	pf WHERE pf.ID > 0
	```

	``` pre
	SELECT MIN(pf)
	FROM /exampleRegion pf
	WHERE pf.ID > 10 LIMIT 50
	```

	``` pre
	SELECT MIN(pf)
	FROM /exampleRegion pf
	WHERE pf.ID > 0 AND pf.status LIKE 'act%'
	```

	The following `MIN` query returns the lowest entry ID that matches the query's selection criteria.

	``` pre
	SELECT MIN(pf.ID)
	FROM /exampleRegion pf, pf.positions.values pos
	WHERE pf.ID > 0 AND pos.secId = 'IBM'
	```

	The following `MIN` query returns the lowest positive ID grouped by status.

	``` pre
	SELECT pf.status, MIN(pf.ID)
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	```

	## <a id="max"></a>MAX

	The `MAX` keyword returns the maximum or highest value from the selected expression. The expression itself must always evaluate to `java.lang.Comparable`. The `MAX` statement returns the type of the selected element as its result.

	The following are example `MAX` queries that return region entries (the entries implement the `java.lang.Comparable` interface).

	``` pre
	SELECT MAX(pf)
	FROM /exampleRegion pf
	```

	``` pre
	SELECT MAX(pf)
	FROM /exampleRegion
	pf WHERE pf.ID > 0
	```

	``` pre
	SELECT MAX(pf)
	FROM /exampleRegion pf
	WHERE pf.ID > 10 LIMIT 50
	```

	``` pre
	SELECT MAX(pf)
	FROM /exampleRegion pf
	WHERE pf.ID > 0 AND pf.status LIKE 'act%'
	```

	The following `MAX` query returns the highest entry ID that matches the query's selection criteria.

	``` pre
	SELECT MAX(pf.ID)
	FROM /exampleRegion pf, pf.positions.values pos
	WHERE pf.ID > 0 AND pos.secId = 'IBM'
	```

	The following `MAX` query returns the highest positive IDs grouped by status.

	``` pre
	SELECT pf.status, MAX(pf.ID)
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	```

	## <a id="count"></a>COUNT

	The `COUNT` keyword returns the number of results that match the query selection conditions specified in the `WHERE` clause. Using `COUNT` allows you to determine the size of a results set.
	The `COUNT` statement always returns a `java.lang.Integer` or `java.lang.Long` as the result (depending on how big the value is); you should take this into consideration when executing the query: if an overflow occurs while computing the `COUNT` function because the value is higher than `Long.MAX_VALUE` (2<sup>63</sup> - 1), the result will be incorrect.

	The following queries are example `COUNT` queries that return region entries:

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion
	```

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion
	WHERE ID > 0
	```

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion
	WHERE ID > 0 LIMIT 50
	```

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion
	WHERE ID > 0 AND status LIKE 'act%'
	```

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion
	WHERE ID IN SET(1,2,3,4,5)
	```

	The following `COUNT` query returns the total number of StructTypes that match the query's selection criteria.

	``` pre
	SELECT COUNT(*)
	FROM /exampleRegion p, p.positions.values pos
	WHERE p.ID > 0 AND pos.secId 'IBM'
	```

	The following `COUNT` query uses the `DISTINCT` keyword and eliminates duplicates from the number of results.

	``` pre
	SELECT DISTINCT COUNT(*)
	FROM /exampleRegion p, p.positions.values pos
	WHERE p.ID > 0 OR p.status = 'active' OR pos.secId = 'IBM'
	```

	## <a id="sum"></a>SUM

	The `SUM` keyword returns the summation of all results that match the query selection conditions specified in the `WHERE` clause. Using `SUM` allows you to aggregate specific numeric values within a results set.
	For partitioned regions, each node's buckets compute a sum over that node and return the result to the coordinator node executing the query, which then aggregates the sums across all nodes.

	The `SUM` function where the `DISTINCT` modifier is applied to the expression returns the summation over the set of unique (distinct) values.
	For partitioned regions, the distinct values in a node's buckets are returned to the coordinator node, which can then calculate the sum over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

	The actual expression used to calculate the aggregation should be an instance of `java.lang.Number`.
	The `SUM` statement always returns a `java.lang.Number` as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of `java.lang.Integer`, `java.lang.Long`, `java.lang.Float` or `java.lang.Double`; you should take this into consideration when executing the query: if an overflow occurs while computing the `SUM` function because the value is higher than `Double.MAX_VALUE` ((2 - 2<sup>-52</sup>) * 2<sup>1023</sup>), the result will be incorrect.

	The following are example `SUM` queries that return the summation of the entries ID.

	``` pre
	SELECT SUM(ID)
	FROM /exampleRegion
	```

	``` pre
	SELECT SUM(ID)
	FROM /exampleRegion
	WHERE ID > 0
	```

	``` pre
	SELECT SUM(ID)
	FROM /exampleRegion
	WHERE ID > 0 LIMIT 50
	```

	``` pre
	SELECT SUM(ID)
	FROM /exampleRegion
	WHERE ID > 0 AND status LIKE 'act%'
	```

	``` pre
	SELECT SUM(ID)
	FROM /exampleRegion
	WHERE ID IN SET(1,2,3,4,5)
	```

	The following `SUM` query returns the total summation of positive IDs grouped by status.

	``` pre
	SELECT pf.status, SUM(pf.ID)
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	```

	The following `SUM` query uses the DISTINCT keyword and eliminates duplicates from the aggregation.

	``` pre
	SELECT SUM(DISTINCT pf.ID)
	FROM /exampleRegion pf, pf.positions.values pos
	WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'
	```

	The following `SUM` query returns the total aggregation of positive IDs grouped by status and sorted by the aggregation result in descending order.

	``` pre
	SELECT pf.status, SUM(pf.ID) as sm
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	ORDER BY sm DESC
	```

	## <a id="avg"></a>AVG

	The `AVG` keyword returns the arithmetic mean of the set formed by the selected expression.
	For partitioned regions, each node's buckets provide both a sum and the number of elements to the node executing the query (coordinator), such that a correct average may be computed.

	The `AVG` keyword where the `DISTINCT` modifier is applied to the expression returns the arithmetic mean of the set of unique (distinct) values.
	For partitioned regions, the distinct values in a node's buckets are returned to the coordinator node, which can then calculate the average over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

	The actual expression used to calculate the aggregation should be an instance of `java.lang.Number`.
	The `AVG` statement always returns a `java.lang.Number` as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of `java.lang.Integer`, `java.lang.Long`, `java.lang.Float` or `java.lang.Double`; you should take this into consideration when executing the query: if an overflow occurs while computing the `AVG` function because the value is higher than `Double.MAX_VALUE` ((2 - 2<sup>-52</sup>) * 2<sup>1023</sup>), or if an overflow occurs while computing the intermediate count because the amount of elements is higher than `Long.MAX_VALUE` (2<sup>63</sup> - 1), the result will be incorrect.

	The following are example `AVG` queries that calculate the average of the entries ID.

	``` pre
	SELECT AVG(ID)
	FROM /exampleRegion
	```

	``` pre
	SELECT AVG(ID)
	FROM /exampleRegion
	WHERE ID > 0
	```

	``` pre
	SELECT AVG(ID)
	FROM /exampleRegion
	WHERE ID > 0 LIMIT 50
	```

	``` pre
	SELECT AVG(ID)
	FROM /exampleRegion
	WHERE ID > 0 AND status LIKE 'act%'
	```

	``` pre
	SELECT AVG(ID)
	FROM /exampleRegion
	WHERE ID IN SET(1,2,3,4,5)
	```

	The following `AVG` query returns the average of positive IDs grouped by status.

	``` pre
	SELECT pf.status, AVG(pf.ID)
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	```

	The following `AVG` query uses the `DISTINCT` keyword and eliminates duplicates from the aggregation.

	``` pre
	SELECT AVG(DISTINCT pf.ID)
	FROM /exampleRegion pf, pf.positions.values pos
	WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'
	```

	The following `AVG` query returns the average of positive IDs grouped by status and sorted by the calculation result in descending order.

	``` pre
	SELECT pf.status, AVG(pf.ID) as sm
	FROM /exampleRegion pf
	WHERE pf.ID > 0
	GROUP BY pf.status
	ORDER BY sm DESC
	```