TOPN_ARRAY returns an array of the N most frequent values in the specified column. It is an approximate calculation function that returns results ordered by count in descending order.
TOPN_ARRAY(<expr>, <top_num> [, <space_expand_rate>])
| Parameter | Description |
|---|---|
<expr> | The column or expression to be counted. |
<top_num> | The number of the most frequent values to return. It must be a positive integer. |
<space_expand_rate> | Optional parameter, which is used to set the number of counters used in the Space-Saving algorithm. counter_numbers = top_num * space_expand_rate , the larger the value of space_expand_rate, the more accurate the result, and the default value is 50. |
Return an array containing the N most frequent values.
-- Create sample table CREATE TABLE page_visits ( page_id INT, user_id INT, visit_date DATE ) DISTRIBUTED BY HASH(page_id) PROPERTIES ( "replication_num" = "1" ); -- Insert test data INSERT INTO page_visits VALUES (1, 101, '2024-01-01'), (2, 102, '2024-01-01'), (1, 103, '2024-01-01'), (3, 101, '2024-01-01'), (1, 104, '2024-01-01'), (2, 105, '2024-01-01'), (1, 106, '2024-01-01'), (4, 107, '2024-01-01'); -- Find top 3 most visited pages SELECT TOPN_ARRAY(page_id, 3) as top_pages FROM page_visits;
+-----------+ | top_pages | +-----------+ | [1, 2, 4] | +-----------+