blob: f32b7dc4be5d3462c17d5a626c83b5f110e2761d [file] [log] [blame] [view]
---
{
"title": "GROUPING_ID",
"language": "en"
}
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## GROUPING_ID
### Name
GROUPING_ID
### Description
Is a function that computes the level of grouping. `GROUPING_ID` can be used only in the `SELECT <select> list`, `HAVING`, or `ORDER BY` clauses when `GROUP BY` is specified.
#### Syntax
```sql
GROUPING_ID ( <column_expression>[ ,...n ] )
```
#### Arguments
`<column_expression>`
Is a `column_expression` in a GROUP BY clause.
#### Return Type
BIGINT
#### Remarks
The GROUPING_ID's `<column_expression>` must exactly match the expression in the `GROUP BY` list. For example, if you are grouping by `user_id`, use `GROUPING_ID (user_id)`; or if you are grouping by `name`, use `GROUPING_ID (name)`.
#### Comparing GROUPING_ID() to GROUPING()
`GROUPING_ID(<column_expression> [ ,...n ])` inputs the equivalent of the `GROUPING(<column_expression>)` return for each column in its column list in each output row as a string of ones and zeros. GROUPING_ID interprets that string as a base-2 number and returns the equivalent integer. For example consider the following statement: `SELECT a, b, c, SUM(d), GROUPING_ID(a,b,c) FROM T GROUP BY <group by list>`. The following table shows the GROUPING_ID() input and output values.
| Columns aggregated | GROUPING_ID (a, b, c) input = GROUPING(a) + GROUPING(b) + GROUPING(c) | GROUPING_ID () output |
| ------------------ | ------------------------------------------------------------ | --------------------- |
| `a` | `100` | `4` |
| `b` | `010` | `2` |
| `c` | `001` | `1` |
| `ab` | `110` | `6` |
| `ac` | `101` | `5` |
| `bc` | `011` | `3` |
| `abc` | `111` | `7` |
#### Technical Definition of GROUPING_ID()
Each GROUPING_ID argument must be an element of the GROUP BY list. GROUPING_ID() returns an integer bitmap whose lowest N bits may be lit. A lit bit indicates the corresponding argument is not a grouping column for the given output row. The lowest-order bit corresponds to argument N, and the (N-1)ᵗʰ lowest-order bit corresponds to argument 1.
#### GROUPING_ID() Equivalents
For a single grouping query, `GROUPING (<column_expression>)` is equivalent to `GROUPING_ID(<column_expression>)`, and both return 0.
For example, the following statements are equivalent:
Statement A:
```sql
SELECT GROUPING_ID(A,B)
FROM T
GROUP BY CUBE(A,B)
```
Statement B:
```sql
SELECT 3 FROM T GROUP BY ()
UNION ALL
SELECT 1 FROM T GROUP BY A
UNION ALL
SELECT 2 FROM T GROUP BY B
UNION ALL
SELECT 0 FROM T GROUP BY A,B
```
### Example
Before starting our example, We first prepare the following data.
```sql
CREATE TABLE employee (
uid INT,
name VARCHAR(32),
level VARCHAR(32),
title VARCHAR(32),
department VARCHAR(32),
hiredate DATE
)
UNIQUE KEY(uid)
DISTRIBUTED BY HASH(uid) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO employee VALUES
(1, 'Abby', 'Senior', 'President', 'Board of Directors', '1999-11-13'),
(2, 'Bob', 'Senior', 'Vice-President', 'Board of Directors', '1999-11-13'),
(3, 'Candy', 'Senior', 'System Engineer', 'Technology', '2005-3-7'),
(4, 'Devere', 'Senior', 'Hardware Engineer', 'Technology', '2006-7-9'),
(5, 'Emilie', 'Senior', 'System Analyst', 'Technology', '2003-8-28'),
(6, 'Fredrick', 'Senior', 'Sales Manager', 'Sales', '2004-9-7'),
(7, 'Gitel', 'Assistant', 'Business Executive', 'Sales', '2003-3-19'),
(8, 'Haden', 'Trainee', 'Sales Assistant', 'Sales', '2007-6-30'),
(9, 'Irene', 'Assistant', 'Business Executive', 'Sales', '2005-10-20'),
(10, 'Jankin', 'Senior', 'Marketing Supervisor', 'Marketing', '2001-4-13'),
(11, 'Louis', 'Trainee', 'Marketing Assistant', 'Marketing', '2007-8-2'),
(12, 'Martin', 'Trainee', 'Marketing Assistant', 'Marketing', '2007-7-1'),
(13, 'Nasir', 'Assistant', 'Marketing Executive', 'Marketing', '2004-9-3');
```
Here is the result.
```text
+------+----------+-----------+----------------------+--------------------+------------+
| uid | name | level | title | department | hiredate |
+------+----------+-----------+----------------------+--------------------+------------+
| 1 | Abby | Senior | President | Board of Directors | 1999-11-13 |
| 2 | Bob | Senior | Vice-President | Board of Directors | 1999-11-13 |
| 3 | Candy | Senior | System Engineer | Technology | 2005-03-07 |
| 4 | Devere | Senior | Hardware Engineer | Technology | 2006-07-09 |
| 5 | Emilie | Senior | System Analyst | Technology | 2003-08-28 |
| 6 | Fredrick | Senior | Sales Manager | Sales | 2004-09-07 |
| 7 | Gitel | Assistant | Business Executive | Sales | 2003-03-19 |
| 8 | Haden | Trainee | Sales Assistant | Sales | 2007-06-30 |
| 9 | Irene | Assistant | Business Executive | Sales | 2005-10-20 |
| 10 | Jankin | Senior | Marketing Supervisor | Marketing | 2001-04-13 |
| 11 | Louis | Trainee | Marketing Assistant | Marketing | 2007-08-02 |
| 12 | Martin | Trainee | Marketing Assistant | Marketing | 2007-07-01 |
| 13 | Nasir | Assistant | Marketing Executive | Marketing | 2004-09-03 |
+------+----------+-----------+----------------------+--------------------+------------+
13 rows in set (0.01 sec)
```
#### A. Using GROUPING_ID to identify grouping levels
The following example returns the count of employees by `department` and `level`. GROUPING_ID() is used to create a value for each row in the `Job Title` column that identifies its level of aggregation.
```sql
SELECT
department,
CASE
WHEN GROUPING_ID(department, level) = 0 THEN level
WHEN GROUPING_ID(department, level) = 1 THEN CONCAT('Total: ', department)
WHEN GROUPING_ID(department, level) = 3 THEN 'Total: Company'
ELSE 'Unknown'
END AS 'Job Title',
COUNT(uid) AS 'Employee Count'
FROM employee
GROUP BY ROLLUP(department, level)
ORDER BY GROUPING_ID(department, level) ASC;
```
Here is the result.
```text
+--------------------+---------------------------+----------------+
| department | Job Title | Employee Count |
+--------------------+---------------------------+----------------+
| Board of Directors | Senior | 2 |
| Technology | Senior | 3 |
| Sales | Senior | 1 |
| Sales | Assistant | 2 |
| Sales | Trainee | 1 |
| Marketing | Senior | 1 |
| Marketing | Trainee | 2 |
| Marketing | Assistant | 1 |
| Board of Directors | Total: Board of Directors | 2 |
| Technology | Total: Technology | 3 |
| Sales | Total: Sales | 4 |
| Marketing | Total: Marketing | 4 |
| NULL | Total: Company | 13 |
+--------------------+---------------------------+----------------+
13 rows in set (0.01 sec)
```
#### B. Using GROUPING_ID to filter a result set
In the following code, to return only the rows that have the count of senior in department.
```sql
SELECT
department,
CASE
WHEN GROUPING_ID(department, level) = 0 THEN level
WHEN GROUPING_ID(department, level) = 1 THEN CONCAT('Total: ', department)
WHEN GROUPING_ID(department, level) = 3 THEN 'Total: Company'
ELSE 'Unknown'
END AS 'Job Title',
COUNT(uid)
FROM employee
GROUP BY ROLLUP(department, level)
HAVING `Job Title` IN ('Senior');
```
Here is the result.
```text
+--------------------+-----------+--------------+
| department | Job Title | count(`uid`) |
+--------------------+-----------+--------------+
| Board of Directors | Senior | 2 |
| Technology | Senior | 3 |
| Sales | Senior | 1 |
| Marketing | Senior | 1 |
+--------------------+-----------+--------------+
5 rows in set (0.01 sec)
```
### Keywords
GROUPING_ID
### Best Practice
For more information, see also:
- [GROUPING](./grouping.md)