blob: 3b4cf33d5952d80258e73bb0ec5f2e5a5ca2a1d7 [file] [log] [blame] [view]
---
{
"title": "TOPN_WEIGHTED",
"language": "en"
}
---
## Description
The TOPN_WEIGHTED function returns the N most frequent values in the specified column with weighted counting. Unlike the regular TOPN function, TOPN_WEIGHTED allows adjusting the importance of values through weights.
## Syntax
```sql
TOPN_WEIGHTED(<expr>, <weight>, <top_num> [, <space_expand_rate>])
```
## Parameters
| Parameter | Description |
| -- | -- |
| `<expr>` | The column or expression to be counted |
| `<weight>` | The column or expression to adjust the weight |
| `<top_num>` | The number of the most frequent values to return. It must be a positive integer. |
| `<space_expand_rate>` | Optional, the value to set the counter_numbers used in the Space-Saving algorithm. `counter_numbers = top_num * space_expand_rate`. The value of space_expand_rate should be greater than 1, and the default value is 50. |
## Return Value
Return an array containing values and weighted counts.
## Examples
```sql
-- create example table
CREATE TABLE product_sales (
product_id INT,
sale_amount DECIMAL(10,2),
sale_date DATE
) DISTRIBUTED BY HASH(product_id)
PROPERTIES (
"replication_num" = "1"
);
-- insert test data
INSERT INTO product_sales VALUES
(1, 100.00, '2024-01-01'),
(2, 50.00, '2024-01-01'),
(1, 150.00, '2024-01-01'),
(3, 75.00, '2024-01-01'),
(1, 200.00, '2024-01-01'),
(2, 80.00, '2024-01-01'),
(1, 120.00, '2024-01-01'),
(4, 90.00, '2024-01-01');
-- find the top 3 products with highest sales amount
SELECT TOPN_WEIGHTED(product_id, sale_amount, 3) as top_products
FROM product_sales;
```
```text
+--------------+
| top_products |
+--------------+
| [1, 2, 4] |
+--------------+
```