docs/query-data/multi-dimensional-analytics.md - doris-website - Git at Google

 ---
 {
     "title": "Multi-Dimensional Analytics",
     "language": "en"
 }
 ---

 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 In a database, ROLLUP, CUBE, and GROUPING SETS are advanced SQL statements used for multidimensional data aggregation. These features significantly enhance the capabilities of the GROUP BY clause, enabling users to obtain multiple levels of summary results in a single query, which is semantically equivalent to using UNION ALL to connect multiple aggregation statements.

 - **ROLLUP**: ROLLUP is an operation used to generate hierarchical summaries. It aggregates data according to the specified column order, gradually summarizing from the finest granularity to the highest level. For example, in sales data, ROLLUP can be used to summarize by region and time, providing sales for each region per month, total sales for each region, and overall total sales. ROLLUP is suitable for scenarios requiring step-by-step summaries.

 - **CUBE**: CUBE is a more powerful aggregation operation that generates all possible summary combinations. Unlike ROLLUP, CUBE calculates subsets for all dimensions. For instance, for sales data aggregated by product and region, CUBE will compute sales for each product in each region, total sales for each product, total sales for each region, and overall total sales. CUBE is applicable to scenarios requiring comprehensive multidimensional analysis, such as business analysis and market research.

 - **GROUPING SETS**: GROUPING SETS offer flexibility in aggregating specific grouping sets. It allows users to specify a set of column combinations for independent aggregation, rather than generating all possible combinations as in ROLLUP and CUBE. For example, one can define summaries for specific combinations of region and time without needing all combinations of each dimension. GROUPING SETS are suitable for scenarios requiring customized summaries, providing flexible aggregation control.

 ROLLUP, CUBE, and GROUPING SETS provide powerful multidimensional data summary functions, catering to various data analysis and reporting needs, and making complex aggregation calculations simpler and more efficient. The following sections will detail the usage scenarios, syntax, and examples of these features.

 ## ROLLUP

 ### Use Case

 ROLLUP is particularly useful for summarizing data along hierarchical dimensions such as time, geography, and category. For instance, queries can specify `ROLLUP(year, month, day)` or `(country, Province, city)`.

 ### Syntax and Example

 The syntax for ROLLUP is as follows:

 ```sql
 SELECT … GROUP BY ROLLUP(grouping_column_reference_list)
 ```

 Here's an example query that analyzes sales sums by year and month:

 ```sql
 SELECT
         YEAR(d_date),
         MONTH(d_date),
         SUM(ss_net_paid) AS total_sum
 FROM
         store_sales,
         date_dim d1
 WHERE
         d1.d_date_sk = ss_sold_date_sk
         AND YEAR(d_date) IN (2001, 2002)
         AND MONTH(d_date) IN (1, 2, 3)
 GROUP BY
         ROLLUP(YEAR(d_date), MONTH(d_date))
 ORDER BY
         YEAR(d_date), MONTH(d_date);
 ```

 This query summarizes data by time, calculating subtotals for sales by year, sales by month within each year, and the grand total of sales. The query result is as follows:

 ```sql
 +--------------+---------------+-------------+
 | YEAR(d_date) | MONTH(d_date) | total_sum   |
 +--------------+---------------+-------------+
 |         NULL |          NULL | 54262669.17 |
 |         2001 |          NULL | 26640320.46 |
 |         2001 |             1 |  9982165.83 |
 |         2001 |             2 |  8454915.34 |
 |         2001 |             3 |  8203239.29 |
 |         2002 |          NULL | 27622348.71 |
 |         2002 |             1 | 11260654.35 |
 |         2002 |             2 |  7722750.61 |
 |         2002 |             3 |  8638943.75 |
 +--------------+---------------+-------------+
 9 rows in set (0.08 sec)
 ```

 ## CUBE

 ### Use Case

 CUBE is best suited for queries involving columns from multiple independent dimensions, rather than columns representing different levels of a single dimension. For example, a common usage scenario is summarizing all combinations of month, region, and product. These are three independent dimensions, and it is common to analyze all possible subtotal combinations. By contrast, cross-tabulating all possible combinations of year, month, and day would include several unnecessary values due to the natural hierarchy in the time dimension. In most analyses, subtotals like profits calculated by month and day are unnecessary. Relatively few users need to ask, "What is the total sales for the 16th of each month throughout the year?"

 ### Syntax and Example

 The syntax for CUBE is as follows:

 ```sql
 SELECT … GROUP BY CUBE(grouping_column_reference_list)
 ```

 Example usage:

 ```sql
 SELECT
         YEAR(d_date),
         i_category,
         ca_state,
         SUM(ss_net_paid) AS total_sum
 FROM
         store_sales,
         date_dim d1,
         item,
         customer_address ca
 WHERE
         d1.d_date_sk = ss_sold_date_sk
         AND i_item_sk = ss_item_sk
         AND ss_addr_sk = ca_address_sk
         AND i_category IN ("Books", "Electronics")
         AND YEAR(d_date) IN (1998, 1999)
         AND ca_state IN ("LA", "AK")
 GROUP BY CUBE(YEAR(d_date), i_category, ca_state)
 ORDER BY YEAR(d_date), i_category, ca_state;
 ```

 The query result is as follows, calculating:

 - The total sales sum;

 - Subtotals for sales by year, by product category, and by state;

 - Subtotals for sales by product category within each year, by state for each product, by state within each year, and by product category within each state and year.

 ```sql
 +--------------+-------------+----------+------------+
 | YEAR(d_date) | i_category  | ca_state | total_sum  |
 +--------------+-------------+----------+------------+
 |         NULL | NULL        | NULL     | 8690374.60 |
 |         NULL | NULL        | AK       | 2675198.33 |
 |         NULL | NULL        | LA       | 6015176.27 |
 |         NULL | Books       | NULL     | 4238177.69 |
 |         NULL | Books       | AK       | 1310791.36 |
 |         NULL | Books       | LA       | 2927386.33 |
 |         NULL | Electronics | NULL     | 4452196.91 |
 |         NULL | Electronics | AK       | 1364406.97 |
 |         NULL | Electronics | LA       | 3087789.94 |
 |         1998 | NULL        | NULL     | 4369656.14 |
 |         1998 | NULL        | AK       | 1402539.19 |
 |         1998 | NULL        | LA       | 2967116.95 |
 |         1998 | Books       | NULL     | 2213703.82 |
 |         1998 | Books       | AK       |  719911.29 |
 |         1998 | Books       | LA       | 1493792.53 |
 |         1998 | Electronics | NULL     | 2155952.32 |
 |         1998 | Electronics | AK       |  682627.90 |
 |         1998 | Electronics | LA       | 1473324.42 |
 |         1999 | NULL        | NULL     | 4320718.46 |
 |         1999 | NULL        | AK       | 1272659.14 |
 |         1999 | NULL        | LA       | 3048059.32 |
 |         1999 | Books       | NULL     | 2024473.87 |
 |         1999 | Books       | AK       |  590880.07 |
 |         1999 | Books       | LA       | 1433593.80 |
 |         1999 | Electronics | NULL     | 2296244.59 |
 |         1999 | Electronics | AK       |  681779.07 |
 |         1999 | Electronics | LA       | 1614465.52 |
 +--------------+-------------+----------+------------+
 27 rows in set (0.21 sec)
 ```

 ## GROUPING FUNCTION

 This section introduces how to address two challenges when using ROLLUP and CUBE:

 1. How to programmatically identify which rows in the result set represent subtotals and accurately determine the aggregation level corresponding to a given subtotal. Since subtotals are often needed for calculations such as percentage of totals, we require a convenient method to identify these subtotal rows.

 2. When the query results contain both actual stored NULL values and "NULL" values generated by ROLLUP or CUBE operations, another problem arises: how to distinguish between these two types of NULL values?

 GROUPING, GROUPING_ID, and GROUPING SETS can effectively solve the aforementioned challenges.

 ### GROUPING

 **1. Principles**

 GROUPING uses a single column as a parameter and returns 1 when encountering a NULL value created by ROLLUP or CUBE operations, indicating that the row is a subtotal. Any other type of value (including NULLs inherently present in the table data) returns 0.

 Example:

 ```sql
 select
         year(d_date),
         month(d_date),
         sum(ss_net_paid) as total_sum,
         grouping(year(d_date)),
         grouping(month(d_date))
 from
         store_sales,
         date_dim d1
 where
         d1.d_date_sk = ss_sold_date_sk
         and year(d_date) in (2001, 2002)
         and month(d_date) in (1, 2, 3)
 group by
         rollup(year(d_date), month(d_date))
 order by
         year(d_date), month(d_date);
 ```

 - The GROUPING function result for the (YEAR(d_date), MONTH(d_date)) group is (0,0) for aggregation by year and month.

 - The GROUPING function result for the (YEAR(d_date)) group is (0,1) for aggregation by year.

 - The GROUPING function result for the () group is (1,1) for the total aggregation.

 Query result:

 ```Plain
 +--------------+---------------+-------------+------------------------+-------------------------+
 | year(d_date) | month(d_date) | total_sum   | Grouping(year(d_date)) | Grouping(month(d_date)) |
 +--------------+---------------+-------------+------------------------+-------------------------+
 |         NULL |          NULL | 54262669.17 |                      1 |                       1 |
 |         2001 |          NULL | 26640320.46 |                      0 |                       1 |
 |         2001 |             1 |  9982165.83 |                      0 |                       0 |
 |         2001 |             2 |  8454915.34 |                      0 |                       0 |
 |         2001 |             3 |  8203239.29 |                      0 |                       0 |
 |         2002 |          NULL | 27622348.71 |                      0 |                       1 |
 |         2002 |             1 | 11260654.35 |                      0 |                       0 |
 |         2002 |             2 |  7722750.61 |                      0 |                       0 |
 |         2002 |             3 |  8638943.75 |                      0 |                       0 |
 +--------------+---------------+-------------+------------------------+-------------------------+
 9 rows in set (0.06 sec)
 ```

 **2. Usage Scenarios, Syntax, and Examples**

 The GROUPING function can be used to filter results. Example:

 ```sql
 select
         year(d_date),
         i_category,
         ca_state,
         sum(ss_net_paid) as total_sum
 from
         store_sales,
         date_dim d1,
         item,
         customer_address ca
 where
         d1.d_date_sk = ss_sold_date_sk
         and i_item_sk = ss_item_sk
         and ss_addr_sk=ca_address_sk
         and i_category in ("Books", "Electronics")
         and year(d_date) in(1998, 1999)
         and ca_state in ("LA", "AK")
 group by cube(year(d_date), i_category, ca_state)
 having grouping(year(d_date))=1 and grouping(i_category)=1 and grouping(ca_state)=1
 or grouping(year(d_date))=0 and grouping(i_category)=1 and grouping(ca_state)=1
 or grouping(year(d_date))=1 and grouping(i_category)=1 and grouping(ca_state)=0
 order by year(d_date), i_category, ca_state;
 ```

 Using the GROUPING function in the HAVING clause retains only the total sales, sales summarized by year, and sales summarized by region. Query result:

 ```Plain
 +---------------------+------------+----------+------------+
 | year(`d1`.`d_date`) | i_category | ca_state | total_sum  |
 +---------------------+------------+----------+------------+
 |                NULL | NULL       | NULL     | 8690374.60 |
 |                NULL | NULL       | AK       | 2675198.33 |
 |                NULL | NULL       | LA       | 6015176.27 |
 |                1998 | NULL       | NULL     | 4369656.14 |
 |                1999 | NULL       | NULL     | 4320718.46 |
 +---------------------+------------+----------+------------+
 5 rows in set (0.13 sec)
 ```

 You can also use the GROUPING function with the IF function to enhance query readability. Example:

 ```sql
 select
         if(grouping(year(d_date)) = 1, "Multi-year sum", year(d_date)) as year,
         if(grouping(i_category) = 1, "Multi-category sum", i_category) as category,
         sum(ss_net_paid) as total_sum
 from
         store_sales,
         date_dim d1,
         item,
         customer_address ca
 where
         d1.d_date_sk = ss_sold_date_sk
         and i_item_sk = ss_item_sk
         and ss_addr_sk = ca_address_sk
         and i_category in ("Books", "Electronics")
         and year(d_date) in (1998, 1999)
         and ca_state in ("LA", "AK")
 group by cube(year(d_date), i_category)
 ```

 Query result:

 ```sql
 +----------------+--------------------+------------+
 | year           | category           | total_sum  |
 +----------------+--------------------+------------+
 | 1998           | Books              | 2213703.82 |
 | 1998           | Electronics        | 2155952.32 |
 | 1999           | Electronics        | 2296244.59 |
 | 1999           | Books              | 2024473.87 |
 | 1998           | Multi-category sum | 4369656.14 |
 | 1999           | Multi-category sum | 4320718.46 |
 | Multi-year sum | Books              | 4238177.69 |
 | Multi-year sum | Electronics        | 4452196.91 |
 | Multi-year sum | Multi-category sum | 8690374.60 |
 +----------------+--------------------+------------+
 9 rows in set (0.09 sec)
 ```

 ### GROUPING_ID

 **1. Usage Scenarios**

 In a database, both GROUPING_ID and GROUPING functions serve as auxiliary functions for handling multidimensional data aggregation queries, such as ROLLUP and CUBE, aiding users in distinguishing between different levels of aggregation results. If you wish to determine the aggregation level of a particular row, you need to use the GROUPING function to compute all GROUP BY columns, as the computation result of a single column alone is insufficient.

 The GROUPING_ID function is more powerful than GROUPING because it can detect multiple columns simultaneously. The GROUPING_ID function accepts multiple columns as parameters and returns an integer that represents the aggregation status of these columns through binary bits. When using tables or materialized views to store computation results, using GROUPING to represent different levels of aggregation can consume considerable storage space. In such scenarios, GROUPING_ID is more appropriate.

 Taking CUBE(a, b) as an example, its GROUPING_ID can be represented as follows:

 | Aggregation Level | Bit Vector | GROUPING_ID | GROUPING(a) | GROUPING(b) |
 | ----------------- | ---------- | ----------- | ----------- | ----------- |
 | a,b               | 0 0        | 0           | 0           | 0           |
 | a                 | 0 1        | 1           | 0           | 1           |
 | b                 | 1 0        | 2           | 1           | 0           |
 | Grand Total       | 1 1        | 3           | 1           | 1           |

 **2. Syntax and Example**

 Here is an example SQL query:

 ```sql
 SELECT
     year(d_date),
     i_category,
     SUM(ss_net_paid) AS total_sum,
     GROUPING(year(d_date)),
     GROUPING(i_category),
     GROUPING_ID(year(d_date), i_category)
 FROM
     store_sales,
     date_dim d1,
     item,
     customer_address ca
 WHERE
     d1.d_date_sk = ss_sold_date_sk
     AND i_item_sk = ss_item_sk
     AND ss_addr_sk = ca_address_sk
     AND i_category IN ('Books', 'Electronics')
     AND year(d_date) IN (1998, 1999)
     AND ca_state IN ('LA', 'AK')
 GROUP BY CUBE(year(d_date), i_category);
 ```

 The query results are as follows:

 ```sql
 +--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
 | year(d_date) | i_category  | total_sum  | GROUPING(year(d_date)) | GROUPING(i_category) | GROUPING_ID(year(d_date), i_category) |
 +--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
 | 1998         | Electronics | 2155952.32 | 0                      | 0                    | 0                                     |
 | 1998         | Books       | 2213703.82 | 0                      | 0                    | 0                                     |
 | 1999         | Electronics | 2296244.59 | 0                      | 0                    | 0                                     |
 | 1999         | Books       | 2024473.87 | 0                      | 0                    | 0                                     |
 | 1998         | NULL        | 4369656.14 | 0                      | 1                    | 1                                     |
 | 1999         | NULL        | 4320718.46 | 0                      | 1                    | 1                                     |
 | NULL         | Electronics | 4452196.91 | 1                      | 0                    | 2                                     |
 | NULL         | Books       | 4238177.69 | 1                      | 0                    | 2                                     |
 | NULL         | NULL        | 8690374.60 | 1                      | 1                    | 3                                     |
 +--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
 9 rows in set (0.12 sec)
 ```

 ### GROUPING SETS

 **1. Usage Scenarios**

 When there is a need to selectively specify the group sets to create, the `GROUPING SETS` expression can be used in the `GROUP BY` clause. This method allows users to precisely specify across multiple dimensions without computing the entire CUBE.

 Since CUBE queries typically consume significant resources, using `GROUPING SETS` can enhance query execution efficiency when only a few dimensions are of interest.

 **2. Syntax and Examples**

 The syntax for `GROUPING SETS` is as follows:

 ```sql
 SELECT … GROUP BY GROUPING SETS(grouping_column_reference_list)
 ```

 If you need:

 - Subtotals of sales for each product category per year

 - Subtotals of sales for each state per year

 - Subtotals of sales for each product in each state per year

 You can use `GROUPING SETS` to specify these dimensions and perform the aggregation. Here is an example:

 ```sql
 SELECT
     YEAR(d_date),
     i_category,
     ca_state,
     SUM(ss_net_paid) AS total_sum
 FROM
     store_sales,
     date_dim d1,
     item,
     customer_address ca
 WHERE
     d1.d_date_sk = ss_sold_date_sk
     AND i_item_sk = ss_item_sk
     AND ss_addr_sk = ca_address_sk
     AND i_category IN ('Books', 'Electronics')
     AND YEAR(d_date) IN (1998, 1999)
     AND ca_state IN ('LA', 'AK')
 GROUP BY GROUPING SETS(
     (YEAR(d_date), i_category),
     (YEAR(d_date), ca_state),
     (YEAR(d_date), ca_state, i_category)
 )
 ORDER BY YEAR(d_date), i_category, ca_state;
 ```

 Query Result:

 ```sql
 +--------------+-------------+----------+------------+
 | YEAR(d_date) | i_category  | ca_state | total_sum  |
 +--------------+-------------+----------+------------+
 | 1998         | NULL        | AK       | 1402539.19 |
 | 1998         | NULL        | LA       | 2967116.95 |
 | 1998         | Books       | NULL     | 2213703.82 |
 | 1998         | Books       | AK       |  719911.29 |
 | 1998         | Books       | LA       | 1493792.53 |
 | 1998         | Electronics | NULL     | 2155952.32 |
 | 1998         | Electronics | AK       |  682627.90 |
 | 1998         | Electronics | LA       | 1473324.42 |
 | 1999         | NULL        | AK       | 1272659.14 |
 | 1999         | NULL        | LA       | 3048059.32 |
 | 1999         | Books       | NULL     | 2024473.87 |
 | 1999         | Books       | AK       |  590880.07 |
 | 1999         | Books       | LA       | 1433593.80 |
 | 1999         | Electronics | NULL     | 2296244.59 |
 | 1999         | Electronics | AK       |  681779.07 |
 | 1999         | Electronics | LA       | 1614465.52 |
 +--------------+-------------+----------+------------+
 16 rows in set (0.11 sec)
 ```

 The above approach is equivalent to using CUBE but specifies concrete `grouping_id`, thereby reducing unnecessary calculations:

 ```sql
 SELECT
     SUM(ss_net_paid) AS total_sum,
     YEAR(d_date),
     i_category,
     ca_state
 FROM
     store_sales,
     date_dim d1,
     item,
     customer_address ca
 WHERE
     d1.d_date_sk = ss_sold_date_sk
     AND i_item_sk = ss_item_sk
     AND ss_addr_sk = ca_address_sk
     AND i_category IN ('Books', 'Electronics')
     AND YEAR(d_date) IN (1998, 1999)
     AND ca_state IN ('LA', 'AK')
 GROUP BY CUBE(YEAR(d_date), ca_state, i_category)
 HAVING grouping_id(YEAR(d_date), ca_state, i_category) = 0
     OR grouping_id(YEAR(d_date), ca_state, i_category) = 2
     OR grouping_id(YEAR(d_date), ca_state, i_category) = 1;
 ```

 :::info Note

 Using `CUBE` computes all possible aggregation levels (eight in this case), but in practice, you may only be interested in a few of them.

 :::

 **3. Semantic Equivalents**

 - GROUPING SETS vs. GROUP BY UNION ALL

   The `GROUPING SETS` statement:

   ```sql
   SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ((k1, k2), (k1), (k2), ());
   ```

   Is equivalent in query results to multiple `GROUP BY` queries connected with `UNION ALL`:

   ```sql
   SELECT k1, k2, SUM(k3) FROM t GROUP BY k1, k2
   UNION ALL
   SELECT k1, NULL, SUM(k3) FROM t GROUP BY k1
   UNION ALL
   SELECT NULL, k2, SUM(k3) FROM t GROUP BY k2
   UNION ALL
   SELECT NULL, NULL, SUM(k3) FROM t;
   ```

   Using `UNION ALL` results in a longer query and requires multiple scans of the base table, making it less efficient in both writing and execution.

 - GROUPING SETS vs. ROLLUP

   `ROLLUP` is an extension of `GROUPING SETS`. For example:

   ```sql
   SELECT a, b, c, SUM(d) FROM tab1 GROUP BY ROLLUP(a, b, c);
   ```

   This `ROLLUP` is equivalent to the following `GROUPING SETS`:

   ```sql
   GROUPING SETS (
       (a, b, c),
       (a, b),
       (a),
       ()
   );
   ```

 - GROUPING SETS vs. CUBE

   `CUBE(a, b, c)` is equivalent to the following `GROUPING SETS`:

   ```sql
   GROUPING SETS (
       (a, b, c),
       (a, b),
       (a, c),
       (a),
       (b, c),
       (b),
       (c),
       ()
   );
   ```

 ## Design Document

 For detailed information, please refer to [SQL for Aggregation in Data Warehouses](https://www.oracle.com/path-to-reference-documentation) on the Oracle official website.
	---
	{
	"title": "Multi-Dimensional Analytics",
	"language": "en"
	}
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	In a database, ROLLUP, CUBE, and GROUPING SETS are advanced SQL statements used for multidimensional data aggregation. These features significantly enhance the capabilities of the GROUP BY clause, enabling users to obtain multiple levels of summary results in a single query, which is semantically equivalent to using UNION ALL to connect multiple aggregation statements.

	- ROLLUP: ROLLUP is an operation used to generate hierarchical summaries. It aggregates data according to the specified column order, gradually summarizing from the finest granularity to the highest level. For example, in sales data, ROLLUP can be used to summarize by region and time, providing sales for each region per month, total sales for each region, and overall total sales. ROLLUP is suitable for scenarios requiring step-by-step summaries.

	- CUBE: CUBE is a more powerful aggregation operation that generates all possible summary combinations. Unlike ROLLUP, CUBE calculates subsets for all dimensions. For instance, for sales data aggregated by product and region, CUBE will compute sales for each product in each region, total sales for each product, total sales for each region, and overall total sales. CUBE is applicable to scenarios requiring comprehensive multidimensional analysis, such as business analysis and market research.

	- GROUPING SETS: GROUPING SETS offer flexibility in aggregating specific grouping sets. It allows users to specify a set of column combinations for independent aggregation, rather than generating all possible combinations as in ROLLUP and CUBE. For example, one can define summaries for specific combinations of region and time without needing all combinations of each dimension. GROUPING SETS are suitable for scenarios requiring customized summaries, providing flexible aggregation control.

	ROLLUP, CUBE, and GROUPING SETS provide powerful multidimensional data summary functions, catering to various data analysis and reporting needs, and making complex aggregation calculations simpler and more efficient. The following sections will detail the usage scenarios, syntax, and examples of these features.

	## ROLLUP

	### Use Case

	ROLLUP is particularly useful for summarizing data along hierarchical dimensions such as time, geography, and category. For instance, queries can specify `ROLLUP(year, month, day)` or `(country, Province, city)`.

	### Syntax and Example

	The syntax for ROLLUP is as follows:

	```sql
	SELECT … GROUP BY ROLLUP(grouping_column_reference_list)
	```

	Here's an example query that analyzes sales sums by year and month:

	```sql
	SELECT
	YEAR(d_date),
	MONTH(d_date),
	SUM(ss_net_paid) AS total_sum
	FROM
	store_sales,
	date_dim d1
	WHERE
	d1.d_date_sk = ss_sold_date_sk
	AND YEAR(d_date) IN (2001, 2002)
	AND MONTH(d_date) IN (1, 2, 3)
	GROUP BY
	ROLLUP(YEAR(d_date), MONTH(d_date))
	ORDER BY
	YEAR(d_date), MONTH(d_date);
	```

	This query summarizes data by time, calculating subtotals for sales by year, sales by month within each year, and the grand total of sales. The query result is as follows:

	```sql
	+--------------+---------------+-------------+
	\| YEAR(d_date) \| MONTH(d_date) \| total_sum \|
	+--------------+---------------+-------------+
	\| NULL \| NULL \| 54262669.17 \|
	\| 2001 \| NULL \| 26640320.46 \|
	\| 2001 \| 1 \| 9982165.83 \|
	\| 2001 \| 2 \| 8454915.34 \|
	\| 2001 \| 3 \| 8203239.29 \|
	\| 2002 \| NULL \| 27622348.71 \|
	\| 2002 \| 1 \| 11260654.35 \|
	\| 2002 \| 2 \| 7722750.61 \|
	\| 2002 \| 3 \| 8638943.75 \|
	+--------------+---------------+-------------+
	9 rows in set (0.08 sec)
	```

	## CUBE

	### Use Case

	CUBE is best suited for queries involving columns from multiple independent dimensions, rather than columns representing different levels of a single dimension. For example, a common usage scenario is summarizing all combinations of month, region, and product. These are three independent dimensions, and it is common to analyze all possible subtotal combinations. By contrast, cross-tabulating all possible combinations of year, month, and day would include several unnecessary values due to the natural hierarchy in the time dimension. In most analyses, subtotals like profits calculated by month and day are unnecessary. Relatively few users need to ask, "What is the total sales for the 16th of each month throughout the year?"

	### Syntax and Example

	The syntax for CUBE is as follows:

	```sql
	SELECT … GROUP BY CUBE(grouping_column_reference_list)
	```

	Example usage:

	```sql
	SELECT
	YEAR(d_date),
	i_category,
	ca_state,
	SUM(ss_net_paid) AS total_sum
	FROM
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	WHERE
	d1.d_date_sk = ss_sold_date_sk
	AND i_item_sk = ss_item_sk
	AND ss_addr_sk = ca_address_sk
	AND i_category IN ("Books", "Electronics")
	AND YEAR(d_date) IN (1998, 1999)
	AND ca_state IN ("LA", "AK")
	GROUP BY CUBE(YEAR(d_date), i_category, ca_state)
	ORDER BY YEAR(d_date), i_category, ca_state;
	```

	The query result is as follows, calculating:

	- The total sales sum;

	- Subtotals for sales by year, by product category, and by state;

	- Subtotals for sales by product category within each year, by state for each product, by state within each year, and by product category within each state and year.

	```sql
	+--------------+-------------+----------+------------+
	\| YEAR(d_date) \| i_category \| ca_state \| total_sum \|
	+--------------+-------------+----------+------------+
	\| NULL \| NULL \| NULL \| 8690374.60 \|
	\| NULL \| NULL \| AK \| 2675198.33 \|
	\| NULL \| NULL \| LA \| 6015176.27 \|
	\| NULL \| Books \| NULL \| 4238177.69 \|
	\| NULL \| Books \| AK \| 1310791.36 \|
	\| NULL \| Books \| LA \| 2927386.33 \|
	\| NULL \| Electronics \| NULL \| 4452196.91 \|
	\| NULL \| Electronics \| AK \| 1364406.97 \|
	\| NULL \| Electronics \| LA \| 3087789.94 \|
	\| 1998 \| NULL \| NULL \| 4369656.14 \|
	\| 1998 \| NULL \| AK \| 1402539.19 \|
	\| 1998 \| NULL \| LA \| 2967116.95 \|
	\| 1998 \| Books \| NULL \| 2213703.82 \|
	\| 1998 \| Books \| AK \| 719911.29 \|
	\| 1998 \| Books \| LA \| 1493792.53 \|
	\| 1998 \| Electronics \| NULL \| 2155952.32 \|
	\| 1998 \| Electronics \| AK \| 682627.90 \|
	\| 1998 \| Electronics \| LA \| 1473324.42 \|
	\| 1999 \| NULL \| NULL \| 4320718.46 \|
	\| 1999 \| NULL \| AK \| 1272659.14 \|
	\| 1999 \| NULL \| LA \| 3048059.32 \|
	\| 1999 \| Books \| NULL \| 2024473.87 \|
	\| 1999 \| Books \| AK \| 590880.07 \|
	\| 1999 \| Books \| LA \| 1433593.80 \|
	\| 1999 \| Electronics \| NULL \| 2296244.59 \|
	\| 1999 \| Electronics \| AK \| 681779.07 \|
	\| 1999 \| Electronics \| LA \| 1614465.52 \|
	+--------------+-------------+----------+------------+
	27 rows in set (0.21 sec)
	```

	## GROUPING FUNCTION

	This section introduces how to address two challenges when using ROLLUP and CUBE:

	1. How to programmatically identify which rows in the result set represent subtotals and accurately determine the aggregation level corresponding to a given subtotal. Since subtotals are often needed for calculations such as percentage of totals, we require a convenient method to identify these subtotal rows.

	2. When the query results contain both actual stored NULL values and "NULL" values generated by ROLLUP or CUBE operations, another problem arises: how to distinguish between these two types of NULL values?

	GROUPING, GROUPING_ID, and GROUPING SETS can effectively solve the aforementioned challenges.

	### GROUPING

	1. Principles

	GROUPING uses a single column as a parameter and returns 1 when encountering a NULL value created by ROLLUP or CUBE operations, indicating that the row is a subtotal. Any other type of value (including NULLs inherently present in the table data) returns 0.

	Example:

	```sql
	select
	year(d_date),
	month(d_date),
	sum(ss_net_paid) as total_sum,
	grouping(year(d_date)),
	grouping(month(d_date))
	from
	store_sales,
	date_dim d1
	where
	d1.d_date_sk = ss_sold_date_sk
	and year(d_date) in (2001, 2002)
	and month(d_date) in (1, 2, 3)
	group by
	rollup(year(d_date), month(d_date))
	order by
	year(d_date), month(d_date);
	```

	- The GROUPING function result for the (YEAR(d_date), MONTH(d_date)) group is (0,0) for aggregation by year and month.

	- The GROUPING function result for the (YEAR(d_date)) group is (0,1) for aggregation by year.

	- The GROUPING function result for the () group is (1,1) for the total aggregation.

	Query result:

	```Plain
	+--------------+---------------+-------------+------------------------+-------------------------+
	\| year(d_date) \| month(d_date) \| total_sum \| Grouping(year(d_date)) \| Grouping(month(d_date)) \|
	+--------------+---------------+-------------+------------------------+-------------------------+
	\| NULL \| NULL \| 54262669.17 \| 1 \| 1 \|
	\| 2001 \| NULL \| 26640320.46 \| 0 \| 1 \|
	\| 2001 \| 1 \| 9982165.83 \| 0 \| 0 \|
	\| 2001 \| 2 \| 8454915.34 \| 0 \| 0 \|
	\| 2001 \| 3 \| 8203239.29 \| 0 \| 0 \|
	\| 2002 \| NULL \| 27622348.71 \| 0 \| 1 \|
	\| 2002 \| 1 \| 11260654.35 \| 0 \| 0 \|
	\| 2002 \| 2 \| 7722750.61 \| 0 \| 0 \|
	\| 2002 \| 3 \| 8638943.75 \| 0 \| 0 \|
	+--------------+---------------+-------------+------------------------+-------------------------+
	9 rows in set (0.06 sec)
	```

	2. Usage Scenarios, Syntax, and Examples

	The GROUPING function can be used to filter results. Example:

	```sql
	select
	year(d_date),
	i_category,
	ca_state,
	sum(ss_net_paid) as total_sum
	from
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	where
	d1.d_date_sk = ss_sold_date_sk
	and i_item_sk = ss_item_sk
	and ss_addr_sk=ca_address_sk
	and i_category in ("Books", "Electronics")
	and year(d_date) in(1998, 1999)
	and ca_state in ("LA", "AK")
	group by cube(year(d_date), i_category, ca_state)
	having grouping(year(d_date))=1 and grouping(i_category)=1 and grouping(ca_state)=1
	or grouping(year(d_date))=0 and grouping(i_category)=1 and grouping(ca_state)=1
	or grouping(year(d_date))=1 and grouping(i_category)=1 and grouping(ca_state)=0
	order by year(d_date), i_category, ca_state;
	```

	Using the GROUPING function in the HAVING clause retains only the total sales, sales summarized by year, and sales summarized by region. Query result:

	```Plain
	+---------------------+------------+----------+------------+
	\| year(`d1`.`d_date`) \| i_category \| ca_state \| total_sum \|
	+---------------------+------------+----------+------------+
	\| NULL \| NULL \| NULL \| 8690374.60 \|
	\| NULL \| NULL \| AK \| 2675198.33 \|
	\| NULL \| NULL \| LA \| 6015176.27 \|
	\| 1998 \| NULL \| NULL \| 4369656.14 \|
	\| 1999 \| NULL \| NULL \| 4320718.46 \|
	+---------------------+------------+----------+------------+
	5 rows in set (0.13 sec)
	```

	You can also use the GROUPING function with the IF function to enhance query readability. Example:

	```sql
	select
	if(grouping(year(d_date)) = 1, "Multi-year sum", year(d_date)) as year,
	if(grouping(i_category) = 1, "Multi-category sum", i_category) as category,
	sum(ss_net_paid) as total_sum
	from
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	where
	d1.d_date_sk = ss_sold_date_sk
	and i_item_sk = ss_item_sk
	and ss_addr_sk = ca_address_sk
	and i_category in ("Books", "Electronics")
	and year(d_date) in (1998, 1999)
	and ca_state in ("LA", "AK")
	group by cube(year(d_date), i_category)
	```

	Query result:

	```sql
	+----------------+--------------------+------------+
	\| year \| category \| total_sum \|
	+----------------+--------------------+------------+
	\| 1998 \| Books \| 2213703.82 \|
	\| 1998 \| Electronics \| 2155952.32 \|
	\| 1999 \| Electronics \| 2296244.59 \|
	\| 1999 \| Books \| 2024473.87 \|
	\| 1998 \| Multi-category sum \| 4369656.14 \|
	\| 1999 \| Multi-category sum \| 4320718.46 \|
	\| Multi-year sum \| Books \| 4238177.69 \|
	\| Multi-year sum \| Electronics \| 4452196.91 \|
	\| Multi-year sum \| Multi-category sum \| 8690374.60 \|
	+----------------+--------------------+------------+
	9 rows in set (0.09 sec)
	```

	### GROUPING_ID

	1. Usage Scenarios

	In a database, both GROUPING_ID and GROUPING functions serve as auxiliary functions for handling multidimensional data aggregation queries, such as ROLLUP and CUBE, aiding users in distinguishing between different levels of aggregation results. If you wish to determine the aggregation level of a particular row, you need to use the GROUPING function to compute all GROUP BY columns, as the computation result of a single column alone is insufficient.

	The GROUPING_ID function is more powerful than GROUPING because it can detect multiple columns simultaneously. The GROUPING_ID function accepts multiple columns as parameters and returns an integer that represents the aggregation status of these columns through binary bits. When using tables or materialized views to store computation results, using GROUPING to represent different levels of aggregation can consume considerable storage space. In such scenarios, GROUPING_ID is more appropriate.

	Taking CUBE(a, b) as an example, its GROUPING_ID can be represented as follows:

	\| Aggregation Level \| Bit Vector \| GROUPING_ID \| GROUPING(a) \| GROUPING(b) \|
	\| ----------------- \| ---------- \| ----------- \| ----------- \| ----------- \|
	\| a,b \| 0 0 \| 0 \| 0 \| 0 \|
	\| a \| 0 1 \| 1 \| 0 \| 1 \|
	\| b \| 1 0 \| 2 \| 1 \| 0 \|
	\| Grand Total \| 1 1 \| 3 \| 1 \| 1 \|

	2. Syntax and Example

	Here is an example SQL query:

	```sql
	SELECT
	year(d_date),
	i_category,
	SUM(ss_net_paid) AS total_sum,
	GROUPING(year(d_date)),
	GROUPING(i_category),
	GROUPING_ID(year(d_date), i_category)
	FROM
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	WHERE
	d1.d_date_sk = ss_sold_date_sk
	AND i_item_sk = ss_item_sk
	AND ss_addr_sk = ca_address_sk
	AND i_category IN ('Books', 'Electronics')
	AND year(d_date) IN (1998, 1999)
	AND ca_state IN ('LA', 'AK')
	GROUP BY CUBE(year(d_date), i_category);
	```

	The query results are as follows:

	```sql
	+--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
	\| year(d_date) \| i_category \| total_sum \| GROUPING(year(d_date)) \| GROUPING(i_category) \| GROUPING_ID(year(d_date), i_category) \|
	+--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
	\| 1998 \| Electronics \| 2155952.32 \| 0 \| 0 \| 0 \|
	\| 1998 \| Books \| 2213703.82 \| 0 \| 0 \| 0 \|
	\| 1999 \| Electronics \| 2296244.59 \| 0 \| 0 \| 0 \|
	\| 1999 \| Books \| 2024473.87 \| 0 \| 0 \| 0 \|
	\| 1998 \| NULL \| 4369656.14 \| 0 \| 1 \| 1 \|
	\| 1999 \| NULL \| 4320718.46 \| 0 \| 1 \| 1 \|
	\| NULL \| Electronics \| 4452196.91 \| 1 \| 0 \| 2 \|
	\| NULL \| Books \| 4238177.69 \| 1 \| 0 \| 2 \|
	\| NULL \| NULL \| 8690374.60 \| 1 \| 1 \| 3 \|
	+--------------+-------------+------------+------------------------+----------------------+---------------------------------------+
	9 rows in set (0.12 sec)
	```

	### GROUPING SETS

	1. Usage Scenarios

	When there is a need to selectively specify the group sets to create, the `GROUPING SETS` expression can be used in the `GROUP BY` clause. This method allows users to precisely specify across multiple dimensions without computing the entire CUBE.

	Since CUBE queries typically consume significant resources, using `GROUPING SETS` can enhance query execution efficiency when only a few dimensions are of interest.

	2. Syntax and Examples

	The syntax for `GROUPING SETS` is as follows:

	```sql
	SELECT … GROUP BY GROUPING SETS(grouping_column_reference_list)
	```

	If you need:

	- Subtotals of sales for each product category per year

	- Subtotals of sales for each state per year

	- Subtotals of sales for each product in each state per year

	You can use `GROUPING SETS` to specify these dimensions and perform the aggregation. Here is an example:

	```sql
	SELECT
	YEAR(d_date),
	i_category,
	ca_state,
	SUM(ss_net_paid) AS total_sum
	FROM
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	WHERE
	d1.d_date_sk = ss_sold_date_sk
	AND i_item_sk = ss_item_sk
	AND ss_addr_sk = ca_address_sk
	AND i_category IN ('Books', 'Electronics')
	AND YEAR(d_date) IN (1998, 1999)
	AND ca_state IN ('LA', 'AK')
	GROUP BY GROUPING SETS(
	(YEAR(d_date), i_category),
	(YEAR(d_date), ca_state),
	(YEAR(d_date), ca_state, i_category)
	)
	ORDER BY YEAR(d_date), i_category, ca_state;
	```

	Query Result:

	```sql
	+--------------+-------------+----------+------------+
	\| YEAR(d_date) \| i_category \| ca_state \| total_sum \|
	+--------------+-------------+----------+------------+
	\| 1998 \| NULL \| AK \| 1402539.19 \|
	\| 1998 \| NULL \| LA \| 2967116.95 \|
	\| 1998 \| Books \| NULL \| 2213703.82 \|
	\| 1998 \| Books \| AK \| 719911.29 \|
	\| 1998 \| Books \| LA \| 1493792.53 \|
	\| 1998 \| Electronics \| NULL \| 2155952.32 \|
	\| 1998 \| Electronics \| AK \| 682627.90 \|
	\| 1998 \| Electronics \| LA \| 1473324.42 \|
	\| 1999 \| NULL \| AK \| 1272659.14 \|
	\| 1999 \| NULL \| LA \| 3048059.32 \|
	\| 1999 \| Books \| NULL \| 2024473.87 \|
	\| 1999 \| Books \| AK \| 590880.07 \|
	\| 1999 \| Books \| LA \| 1433593.80 \|
	\| 1999 \| Electronics \| NULL \| 2296244.59 \|
	\| 1999 \| Electronics \| AK \| 681779.07 \|
	\| 1999 \| Electronics \| LA \| 1614465.52 \|
	+--------------+-------------+----------+------------+
	16 rows in set (0.11 sec)
	```

	The above approach is equivalent to using CUBE but specifies concrete `grouping_id`, thereby reducing unnecessary calculations:

	```sql
	SELECT
	SUM(ss_net_paid) AS total_sum,
	YEAR(d_date),
	i_category,
	ca_state
	FROM
	store_sales,
	date_dim d1,
	item,
	customer_address ca
	WHERE
	d1.d_date_sk = ss_sold_date_sk
	AND i_item_sk = ss_item_sk
	AND ss_addr_sk = ca_address_sk
	AND i_category IN ('Books', 'Electronics')
	AND YEAR(d_date) IN (1998, 1999)
	AND ca_state IN ('LA', 'AK')
	GROUP BY CUBE(YEAR(d_date), ca_state, i_category)
	HAVING grouping_id(YEAR(d_date), ca_state, i_category) = 0
	OR grouping_id(YEAR(d_date), ca_state, i_category) = 2
	OR grouping_id(YEAR(d_date), ca_state, i_category) = 1;
	```

	:::info Note

	Using `CUBE` computes all possible aggregation levels (eight in this case), but in practice, you may only be interested in a few of them.

	:::

	3. Semantic Equivalents

	- GROUPING SETS vs. GROUP BY UNION ALL

	The `GROUPING SETS` statement:

	```sql
	SELECT k1, k2, SUM(k3) FROM t GROUP BY GROUPING SETS ((k1, k2), (k1), (k2), ());
	```

	Is equivalent in query results to multiple `GROUP BY` queries connected with `UNION ALL`:

	```sql
	SELECT k1, k2, SUM(k3) FROM t GROUP BY k1, k2
	UNION ALL
	SELECT k1, NULL, SUM(k3) FROM t GROUP BY k1
	UNION ALL
	SELECT NULL, k2, SUM(k3) FROM t GROUP BY k2
	UNION ALL
	SELECT NULL, NULL, SUM(k3) FROM t;
	```

	Using `UNION ALL` results in a longer query and requires multiple scans of the base table, making it less efficient in both writing and execution.

	- GROUPING SETS vs. ROLLUP

	`ROLLUP` is an extension of `GROUPING SETS`. For example:

	```sql
	SELECT a, b, c, SUM(d) FROM tab1 GROUP BY ROLLUP(a, b, c);
	```

	This `ROLLUP` is equivalent to the following `GROUPING SETS`:

	```sql
	GROUPING SETS (
	(a, b, c),
	(a, b),
	(a),
	()
	);
	```

	- GROUPING SETS vs. CUBE

	`CUBE(a, b, c)` is equivalent to the following `GROUPING SETS`:

	```sql
	GROUPING SETS (
	(a, b, c),
	(a, b),
	(a, c),
	(a),
	(b, c),
	(b),
	(c),
	()
	);
	```

	## Design Document

	For detailed information, please refer to [SQL for Aggregation in Data Warehouses](https://www.oracle.com/path-to-reference-documentation) on the Oracle official website.