| --- |
| layout: global |
| title: Window Functions |
| displayTitle: Window Functions |
| license: | |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --- |
| |
| ### Description |
| |
| Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row. |
| |
| ### Syntax |
| |
| ```sql |
| window_function [ nulls_option ] OVER |
| ( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ] |
| { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] |
| [ window_frame ] ) |
| ``` |
| |
| ### Parameters |
| |
| * **window_function** |
| |
| * Ranking Functions |
| |
| **Syntax:** `RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER` |
| |
| * Analytic Functions |
| |
| **Syntax:** `CUME_DIST | LAG | LEAD | NTH_VALUE | FIRST_VALUE | LAST_VALUE` |
| |
| * Aggregate Functions |
| |
| **Syntax:** `MAX | MIN | COUNT | SUM | AVG | ...` |
| |
| Please refer to the [Built-in Aggregation Functions](sql-ref-functions-builtin.html#aggregate-functions) document for a complete list of Spark aggregate functions. |
| |
| * **nulls_option** |
| |
| Specifies whether or not to skip null values when evaluating the window function. `RESPECT NULLS` means not skipping null values, while `IGNORE NULLS` means skipping. If not specified, the default is `RESPECT NULLS`. |
| |
| **Syntax:** |
| |
| `{ IGNORE | RESPECT } NULLS` |
| |
| **Note:** Only `LAG | LEAD | NTH_VALUE | FIRST_VALUE | LAST_VALUE` can be used with `IGNORE NULLS`. |
| |
| * **window_frame** |
| |
| Specifies which row to start the window on and where to end it. |
| |
| **Syntax:** |
| |
| `{ RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end }` |
| |
| * `frame_start` and `frame_end` have the following syntax: |
| |
| **Syntax:** |
| |
| `UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING` |
| |
| `offset:` specifies the `offset` from the position of the current row. |
| |
| **Note:** If `frame_end` is omitted it defaults to `CURRENT ROW`. |
| |
| ### Examples |
| |
| ```sql |
| CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT); |
| |
| INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35); |
| INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38); |
| INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28); |
| INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33); |
| INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33); |
| INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28); |
| INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38); |
| INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23); |
| INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25); |
| |
| SELECT * FROM employees; |
| +-----+-----------+------+-----+ |
| | name| dept|salary| age| |
| +-----+-----------+------+-----+ |
| |Chloe|Engineering| 23000| 25| |
| | Fred|Engineering| 21000| 28| |
| | Paul|Engineering| 29000| 23| |
| |Helen| Marketing| 29000| 40| |
| | Tom|Engineering| 23000| 33| |
| | Jane| Marketing| 29000| 28| |
| | Jeff| Marketing| 35000| 38| |
| | Evan| Sales| 32000| 38| |
| | Lisa| Sales| 10000| 35| |
| | Alex| Sales| 30000| 33| |
| +-----+-----------+------+-----+ |
| |
| SELECT name, dept, salary, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees; |
| +-----+-----------+------+----+ |
| | name| dept|salary|rank| |
| +-----+-----------+------+----+ |
| | Lisa| Sales| 10000| 1| |
| | Alex| Sales| 30000| 2| |
| | Evan| Sales| 32000| 3| |
| | Fred|Engineering| 21000| 1| |
| | Tom|Engineering| 23000| 2| |
| |Chloe|Engineering| 23000| 2| |
| | Paul|Engineering| 29000| 4| |
| |Helen| Marketing| 29000| 1| |
| | Jane| Marketing| 29000| 1| |
| | Jeff| Marketing| 35000| 3| |
| +-----+-----------+------+----+ |
| |
| SELECT name, dept, salary, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS BETWEEN |
| UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees; |
| +-----+-----------+------+----------+ |
| | name| dept|salary|dense_rank| |
| +-----+-----------+------+----------+ |
| | Lisa| Sales| 10000| 1| |
| | Alex| Sales| 30000| 2| |
| | Evan| Sales| 32000| 3| |
| | Fred|Engineering| 21000| 1| |
| | Tom|Engineering| 23000| 2| |
| |Chloe|Engineering| 23000| 2| |
| | Paul|Engineering| 29000| 3| |
| |Helen| Marketing| 29000| 1| |
| | Jane| Marketing| 29000| 1| |
| | Jeff| Marketing| 35000| 2| |
| +-----+-----------+------+----------+ |
| |
| SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age |
| RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cume_dist FROM employees; |
| +-----+-----------+------+------------------+ |
| | name| dept|age | cume_dist| |
| +-----+-----------+------+------------------+ |
| | Alex| Sales| 33|0.3333333333333333| |
| | Lisa| Sales| 35|0.6666666666666666| |
| | Evan| Sales| 38| 1.0| |
| | Paul|Engineering| 23| 0.25| |
| |Chloe|Engineering| 25| 0.75| |
| | Fred|Engineering| 28| 0.25| |
| | Tom|Engineering| 33| 1.0| |
| | Jane| Marketing| 28|0.3333333333333333| |
| | Jeff| Marketing| 38|0.6666666666666666| |
| |Helen| Marketing| 40| 1.0| |
| +-----+-----------+------+------------------+ |
| |
| SELECT name, dept, salary, MIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min |
| FROM employees; |
| +-----+-----------+------+-----+ |
| | name| dept|salary| min| |
| +-----+-----------+------+-----+ |
| | Lisa| Sales| 10000|10000| |
| | Alex| Sales| 30000|10000| |
| | Evan| Sales| 32000|10000| |
| |Helen| Marketing| 29000|29000| |
| | Jane| Marketing| 29000|29000| |
| | Jeff| Marketing| 35000|29000| |
| | Fred|Engineering| 21000|21000| |
| | Tom|Engineering| 23000|21000| |
| |Chloe|Engineering| 23000|21000| |
| | Paul|Engineering| 29000|21000| |
| +-----+-----------+------+-----+ |
| |
| SELECT name, salary, |
| LAG(salary) OVER (PARTITION BY dept ORDER BY salary) AS lag, |
| LEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) AS lead |
| FROM employees; |
| +-----+-----------+------+-----+-----+ |
| | name| dept|salary| lag| lead| |
| +-----+-----------+------+-----+-----+ |
| | Lisa| Sales| 10000|NULL |30000| |
| | Alex| Sales| 30000|10000|32000| |
| | Evan| Sales| 32000|30000| 0| |
| | Fred|Engineering| 21000| NULL|23000| |
| |Chloe|Engineering| 23000|21000|23000| |
| | Tom|Engineering| 23000|23000|29000| |
| | Paul|Engineering| 29000|23000| 0| |
| |Helen| Marketing| 29000| NULL|29000| |
| | Jane| Marketing| 29000|29000|35000| |
| | Jeff| Marketing| 35000|29000| 0| |
| +-----+-----------+------+-----+-----+ |
| |
| SELECT id, v, |
| LEAD(v, 0) IGNORE NULLS OVER w lead, |
| LAG(v, 0) IGNORE NULLS OVER w lag, |
| NTH_VALUE(v, 2) IGNORE NULLS OVER w nth_value, |
| FIRST_VALUE(v) IGNORE NULLS OVER w first_value, |
| LAST_VALUE(v) IGNORE NULLS OVER w last_value |
| FROM test_ignore_null |
| WINDOW w AS (ORDER BY id) |
| ORDER BY id; |
| +--+----+----+----+---------+-----------+----------+ |
| |id| v|lead| lag|nth_value|first_value|last_value| |
| +--+----+----+----+---------+-----------+----------+ |
| | 0|NULL|NULL|NULL| NULL| NULL| NULL| |
| | 1| x| x| x| NULL| x| x| |
| | 2|NULL|NULL|NULL| NULL| x| x| |
| | 3|NULL|NULL|NULL| NULL| x| x| |
| | 4| y| y| y| y| x| y| |
| | 5|NULL|NULL|NULL| y| x| y| |
| | 6| z| z| z| y| x| z| |
| | 7| v| v| v| y| x| v| |
| | 8|NULL|NULL|NULL| y| x| v| |
| +--+----+----+----+---------+-----------+----------+ |
| ``` |
| |
| ### Related Statements |
| |
| * [SELECT](sql-ref-syntax-qry-select.html) |