blob: 4e025e788b5c27d419ae6451ff48b2bc83bebbc2 [file] [log] [blame] [view]
---
{
"title": "PERCENTILE_ARRAY",
"language": "en",
"description": "The PERCENTILEARRAY function calculates exact percentile arrays, allowing multiple percentile values to be computed at once."
}
---
## Description
The `PERCENTILE_ARRAY` function calculates exact percentile arrays, allowing multiple percentile values to be computed at once. This function is primarily suitable for small datasets.
Key features:
1. Exact Calculation: Provides exact percentile results rather than approximations
2. Batch Processing: Can calculate multiple percentiles in a single operation
3. Scope: Best suited for handling small-scale datasets
## Syntax
```sql
PERCENTILE_ARRAY(<col>, <array_p>)
```
## Parameters
| Parameter | Description |
| -- | -- |
| `<col>` | The column to calculate the exact percentiles for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. |
| `<array_p>` | Percentile array, each element must be a constant of type Array<Double>, with values in the range `[0.0, 1.0]`, e.g., `[0.5, 0.95, 0.99]`. |
## Return Value
Returns a DOUBLE type array containing the calculated percentile values.
If there is no valid data in the group, returns an empty array.
## Examples
```sql
-- setup
CREATE TABLE sales_data (
id INT,
amount DECIMAL(10, 2)
) DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
INSERT INTO sales_data VALUES
(1, 10.5),
(2, 15.2),
(3, 20.1),
(4, 25.8),
(5, 30.3),
(6, 35.7),
(7, 40.2),
(8, 45.9),
(9, 50.4),
(10, 100.6);
```
```sql
SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles
FROM sales_data;
```
Calculate multiple percentiles.
```text
+-----------------------------------------------------+
| percentiles |
+-----------------------------------------------------+
| [21.525000000000002, 33, 44.475, 55.41999999999998] |
+-----------------------------------------------------+
```
```sql
SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM sales_data;
```
Only non-NULL data is calculated.
```text
+------------------------------------------------------------+
| percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) |
+------------------------------------------------------------+
| [100.6, 100.6] |
+------------------------------------------------------------+
```
```sql
SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data;
```
Returns an empty array when all input data is NULL.
```text
+-------------------------------------+
| percentile_array(NULL, [0.5, 0.99]) |
+-------------------------------------+
| [] |
+-------------------------------------+
```