The PERCENTILE_ARRAY function calculates exact percentile arrays, allowing multiple percentile values to be computed at once. This function is primarily suitable for small datasets.
Key features:
PERCENTILE_ARRAY(<col>, <array_p>)
| Parameter | Description |
|---|---|
<col> | The column to calculate the exact percentiles for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. |
<array_p> | Percentile array, each element must be a constant of type Array, with values in the range [0.0, 1.0], e.g., [0.5, 0.95, 0.99]. |
Returns a DOUBLE type array containing the calculated percentile values. If there is no valid data in the group, returns an empty array.
-- setup CREATE TABLE sales_data ( id INT, amount DECIMAL(10, 2) ) DUPLICATE KEY(`id`) DISTRIBUTED BY HASH(`id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO sales_data VALUES (1, 10.5), (2, 15.2), (3, 20.1), (4, 25.8), (5, 30.3), (6, 35.7), (7, 40.2), (8, 45.9), (9, 50.4), (10, 100.6);
SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles FROM sales_data;
Calculate multiple percentiles.
+-----------------------------------------------------+ | percentiles | +-----------------------------------------------------+ | [21.525000000000002, 33, 44.475, 55.41999999999998] | +-----------------------------------------------------+
SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM sales_data;
Only non-NULL data is calculated.
+------------------------------------------------------------+ | percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) | +------------------------------------------------------------+ | [100.6, 100.6] | +------------------------------------------------------------+
SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data;
Returns an empty array when all input data is NULL.
+-------------------------------------+ | percentile_array(NULL, [0.5, 0.99]) | +-------------------------------------+ | [] | +-------------------------------------+