blob: 2fcd3a0eb40bcf9b527afd89fbb6ad3ed1ff97c8 [file] [log] [blame] [view]
---
{
"title": "TOPN_ARRAY",
"language": "en",
"description": "TOPNARRAY returns an array of the N most frequent values in the specified column."
}
---
## Description
TOPN_ARRAY returns an array of the N most frequent values in the specified column. It is an approximate calculation function that returns results ordered by count in descending order.
## Syntax
```sql
TOPN_ARRAY(<expr>, <top_num> [, <space_expand_rate>])
```
## Parameters
| Parameter | Description |
| -- | -- |
| `<expr>` | The column or expression to be counted. Supported types: TinyInt, SmallInt, Integer, BigInt, LargeInt, Float, Double, Decimal, Date, Datetime, IPV4, IPV6, String. |
| `<top_num>` | The number of most frequent values to return. Must be a positive integer. Supported type: Integer. |
| `<space_expand_rate>` | Optional. Sets the number of counters used in the Space-Saving algorithm: `counter_numbers = top_num * space_expand_rate`. The larger the value, the more accurate the result. Default is 50. Supported type: Integer. |
## Return Value
Returns an array containing the N most frequent values.
If there is no valid data in the group, returns NULL.
## Example
```sql
-- setup
CREATE TABLE page_visits (
page_id INT,
user_id INT,
visit_date DATE
) DISTRIBUTED BY HASH(page_id)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO page_visits VALUES
(1, 101, '2024-01-01'),
(2, 102, '2024-01-01'),
(1, 103, '2024-01-01'),
(3, 101, '2024-01-01'),
(1, 104, '2024-01-01'),
(2, 105, '2024-01-01'),
(1, 106, '2024-01-01'),
(4, 107, '2024-01-01');
```
```sql
SELECT TOPN_ARRAY(page_id, 3) as top_pages
FROM page_visits;
```
Find the top 3 most visited pages.
```text
+-----------+
| top_pages |
+-----------+
| [1, 2, 4] |
+-----------+
```
```sql
SELECT TOPN_ARRAY(page_id, 3) as top_pages FROM page_visits where page_id is null;
```
```text
+-----------+
| top_pages |
+-----------+
| NULL |
+-----------+
```