Calculates the exact percentile, suitable for small datasets. First sorts the specified column in descending order, then takes the exact p-th percentile. The value of p is between 0 and 1. If p does not point to an exact position, it returns the linear interpolation of the adjacent values at position p. Note that this is not the average of the two numbers. Special cases:
PERCENTILE(<col>, <p>)
| Parameter | Description |
|---|---|
<col> | The column to calculate the exact percentile for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. |
<p> | The exact percentile to be calculated, a constant value. Supported type: Double. Range: [0.0, 1.0]. The second parameter must be a constant. |
Return the exact percentile of the specified column, with a return type of Double. If there is no valid data in the group, returns NULL.
-- Setup CREATE TABLE sales_data ( product_id INT, sale_price DECIMAL(10, 2) ) DUPLICATE KEY(`product_id`) DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO sales_data VALUES (1, 10.00), (1, 15.00), (1, 20.00), (1, 25.00), (1, 30.00), (1, 35.00), (1, 40.00), (1, 45.00), (1, 50.00), (1, 100.00);
SELECT percentile(sale_price, 0.5) as median_price, -- Median percentile(sale_price, 0.75) as p75_price, -- 75th percentile percentile(sale_price, 0.90) as p90_price, -- 90th percentile percentile(sale_price, 0.95) as p95_price, -- 95th percentile percentile(null, 0.99) as p99_null -- Null value at 99th percentile FROM sales_data;
Calculates sale prices at different percentiles.
+--------------+-----------+-------------------+-------------------+----------+ | median_price | p75_price | p90_price | p95_price | p99_null | +--------------+-----------+-------------------+-------------------+----------+ | 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL | +--------------+-----------+-------------------+-------------------+----------+
select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data;
Only non-NULL input values are considered in the calculation.
+----------------------------------------------------+ | percentile(if(sale_price>90,sale_price,NULL), 0.5) | +----------------------------------------------------+ | 100 | +----------------------------------------------------+
select percentile(sale_price, NULL) from sales_data;
If all input values are NULL, returns NULL.
+------------------------------+ | percentile(sale_price, NULL) | +------------------------------+ | NULL | +------------------------------+