The TOPN_WEIGHTED function returns the N most frequent values in the specified column with weighted counting. Unlike the regular TOPN function, TOPN_WEIGHTED allows adjusting the importance of values through weights.
TOPN_WEIGHTED(<expr>, <weight>, <top_num> [, <space_expand_rate>])
| Parameter | Description |
|---|---|
<expr> | The column or expression to be counted |
<weight> | The column or expression to adjust the weight |
<top_num> | The number of the most frequent values to return. It must be a positive integer. |
<space_expand_rate> | Optional, the value to set the counter_numbers used in the Space-Saving algorithm. counter_numbers = top_num * space_expand_rate. The value of space_expand_rate should be greater than 1, and the default value is 50. |
Return an array containing values and weighted counts.
-- create example table CREATE TABLE product_sales ( product_id INT, sale_amount DECIMAL(10,2), sale_date DATE ) DISTRIBUTED BY HASH(product_id) PROPERTIES ( "replication_num" = "1" ); -- insert test data INSERT INTO product_sales VALUES (1, 100.00, '2024-01-01'), (2, 50.00, '2024-01-01'), (1, 150.00, '2024-01-01'), (3, 75.00, '2024-01-01'), (1, 200.00, '2024-01-01'), (2, 80.00, '2024-01-01'), (1, 120.00, '2024-01-01'), (4, 90.00, '2024-01-01'); -- find the top 3 products with highest sales amount SELECT TOPN_WEIGHTED(product_id, sale_amount, 3) as top_products FROM product_sales;
+--------------+ | top_products | +--------------+ | [1, 2, 4] | +--------------+