For time-series data feature analysis scenarios, IoTDB provides the capability of window functions, which deliver a flexible and efficient solution for in-depth mining and complex computation of time-series data. The following sections will elaborate on the feature in detail.
Window Functions perform calculations on each row based on a specific set of rows related to the current row (called a “window”). It combines grouping operations (PARTITION BY), sorting (ORDER BY), and definable calculation ranges (window frame FRAME), enabling complex cross-row calculations without collapsing the original data rows. It is commonly used in data analysis scenarios such as ranking, cumulative sums, moving averages, etc.
Note: This feature is available starting from version V 2.0.5.
For example, in a scenario where you need to query the cumulative power consumption values of different devices, you can achieve this using window functions.
-- Original data +-----------------------------+------+-----+ | time|device| flow| +-----------------------------+------+-----+ |1970-01-01T08:00:00.000+08:00| d0| 3| |1970-01-01T08:00:00.001+08:00| d0| 5| |1970-01-01T08:00:00.002+08:00| d0| 3| |1970-01-01T08:00:00.003+08:00| d0| 1| |1970-01-01T08:00:00.004+08:00| d1| 2| |1970-01-01T08:00:00.005+08:00| d1| 4| +-----------------------------+------+-----+ -- Create table and insert data CREATE TABLE device_flow(device String tag, flow INT32 FIELD); insert into device_flow(time, device ,flow ) values ('1970-01-01T08:00:00.000+08:00','d0',3),('1970-01-01T08:00:01.000+08:00','d0',5),('1970-01-01T08:00:02.000+08:00','d0',3),('1970-01-01T08:00:03.000+08:00','d0',1),('1970-01-01T08:00:04.000+08:00','d1',2),('1970-01-01T08:00:05.000+08:00','d1',4); -- Execute window function query SELECT *, sum(flow) OVER(PARTITION BY device ORDER BY flow) as sum FROM device_flow;
After grouping, sorting, and calculation (steps are disassembled as shown in the figure below),
the expected results can be obtained:
+-----------------------------+------+----+----+ | time|device|flow| sum| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 2.0| |1970-01-01T08:00:05.000+08:00| d1| 4| 6.0| |1970-01-01T08:00:03.000+08:00| d0| 1| 1.0| |1970-01-01T08:00:00.000+08:00| d0| 3| 7.0| |1970-01-01T08:00:02.000+08:00| d0| 3| 7.0| |1970-01-01T08:00:01.000+08:00| d0| 5|12.0| +-----------------------------+------+----+----+
windowDefinition : name=identifier AS '(' windowSpecification ')' ; windowSpecification : (existingWindowName=identifier)? (PARTITION BY partition+=expression (',' partition+=expression)*)? (ORDER BY sortItem (',' sortItem)*)? windowFrame? ; windowFrame : frameExtent ; frameExtent : frameType=RANGE start=frameBound | frameType=ROWS start=frameBound | frameType=GROUPS start=frameBound | frameType=RANGE BETWEEN start=frameBound AND end=frameBound | frameType=ROWS BETWEEN start=frameBound AND end=frameBound | frameType=GROUPS BETWEEN start=frameBound AND end=frameBound ; frameBound : UNBOUNDED boundType=PRECEDING #unboundedFrame | UNBOUNDED boundType=FOLLOWING #unboundedFrame | CURRENT ROW #currentRowBound | expression boundType=(PRECEDING | FOLLOWING) #boundedFrame ;
PARTITION BY is used to divide data into multiple independent, unrelated “groups”. Window functions can only access and operate on data within their respective groups, and cannot access data from other groups. This clause is optional; if not explicitly specified, all data is divided into the same group by default. It is worth noting that unlike GROUP BY which aggregates a group of data into a single row, the window function with PARTITION BY does not affect the number of rows within the group.
Query statement:
IoTDB> SELECT *, count(flow) OVER (PARTITION BY device) as count FROM device_flow;
Disassembly steps:
Query result:
+-----------------------------+------+----+-----+ | time|device|flow|count| +-----------------------------+------+----+-----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 2| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:00.000+08:00| d0| 3| 4| |1970-01-01T08:00:01.000+08:00| d0| 5| 4| |1970-01-01T08:00:02.000+08:00| d0| 3| 4| |1970-01-01T08:00:03.000+08:00| d0| 1| 4| +-----------------------------+------+----+-----+
ORDER BY is used to sort data within a partition. After sorting, rows with equal values are called peers. Peers affect the behavior of window functions; for example, different rank functions handle peers differently, and different frame division methods also handle peers differently. This clause is optional.
Query statement:
IoTDB> SELECT *, rank() OVER (PARTITION BY device ORDER BY flow) as rank FROM device_flow;
Disassembly steps:
Query result:
+-----------------------------+------+----+----+ | time|device|flow|rank| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 2| |1970-01-01T08:00:02.000+08:00| d0| 3| 2| |1970-01-01T08:00:01.000+08:00| d0| 5| 4| +-----------------------------+------+----+----+
For each row in a partition, the window function evaluates on a corresponding set of rows called a Frame (i.e., the input domain of the Window Function on each row). The Frame can be specified manually, involving two attributes when specified, as detailed below.
Among them, the meanings of CURRENT ROW, PRECEDING N, and FOLLOWING N vary with the type of frame, as shown in the following table:
ROWS | GROUPS | RANGE | |
|---|---|---|---|
CURRENT ROW | Current row | Since a peer group contains multiple rows, this option differs depending on whether it acts on frame_start and frame_end: * frame_start: the first row of the peer group; * frame_end: the last row of the peer group. | Same as GROUPS, differing depending on whether it acts on frame_start and frame_end: * frame_start: the first row of the peer group; * frame_end: the last row of the peer group. |
offset PRECEDING | The previous offset rows | The previous offset peer groups; | Rows whose value difference from the current row in the preceding direction is less than or equal to offset are grouped into one frame |
offset FOLLOWING | The following offset rows | The following offset peer groups. | Rows whose value difference from the current row in the following direction is less than or equal to offset are grouped into one frame |
The syntax format is as follows:
-- Specify both frame_start and frame_end { RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end -- Specify only frame_start, frame_end is CURRENT ROW { RANGE | ROWS | GROUPS } frame_start
If the Frame is not specified manually, the default Frame division rules are as follows:
It should be noted that when the Frame type is GROUPS or RANGE, ORDER BY must be specified. The difference is that ORDER BY in GROUPS can involve multiple fields, while RANGE requires calculation and thus can only specify one field.
Query statement:
IoTDB> SELECT *, count(flow) OVER(PARTITION BY device ROWS 1 PRECEDING) as count FROM device_flow;
Disassembly steps:
Query result:
+-----------------------------+------+----+-----+ | time|device|flow|count| +-----------------------------+------+----+-----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:00.000+08:00| d0| 3| 1| |1970-01-01T08:00:01.000+08:00| d0| 5| 2| |1970-01-01T08:00:02.000+08:00| d0| 3| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 2| +-----------------------------+------+----+-----+
Query statement:
IoTDB> SELECT *, count(flow) OVER(PARTITION BY device ORDER BY flow GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW) as count FROM device_flow;
Disassembly steps:
Query result:
+-----------------------------+------+----+-----+ | time|device|flow|count| +-----------------------------+------+----+-----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 3| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| +-----------------------------+------+----+-----+
Query statement:
IoTDB> SELECT *,count(flow) OVER(PARTITION BY device ORDER BY flow RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) as count FROM device_flow;
Disassembly steps:
Query result:
+-----------------------------+------+----+-----+ | time|device|flow|count| +-----------------------------+------+----+-----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 3| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| +-----------------------------+------+----+-----+
All built-in aggregate functions such as sum(), avg(), min(), max() can be used as Window Functions.
Note: Unlike GROUP BY, each row has a corresponding output in the Window Function
Example:
IoTDB> SELECT *, sum(flow) OVER (PARTITION BY device ORDER BY flow) as sum FROM device_flow; +-----------------------------+------+----+----+ | time|device|flow| sum| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 2.0| |1970-01-01T08:00:05.000+08:00| d1| 4| 6.0| |1970-01-01T08:00:03.000+08:00| d0| 1| 1.0| |1970-01-01T08:00:00.000+08:00| d0| 3| 7.0| |1970-01-01T08:00:02.000+08:00| d0| 3| 7.0| |1970-01-01T08:00:01.000+08:00| d0| 5|12.0| +-----------------------------+------+----+----+
first_valuefirst_value(value) [IGNORE NULLS]IoTDB> SELECT *, first_value(flow) OVER w as first_value FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-----------------------------+------+----+-----------+ | time|device|flow|first_value| +-----------------------------+------+----+-----------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 2| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 1| |1970-01-01T08:00:02.000+08:00| d0| 3| 3| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| +-----------------------------+------+----+-----------+
last_valuelast_value(value) [IGNORE NULLS]IoTDB> SELECT *, last_value(flow) OVER w as last_value FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-----------------------------+------+----+----------+ | time|device|flow|last_value| +-----------------------------+------+----+----------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 4| |1970-01-01T08:00:05.000+08:00| d1| 4| 4| |1970-01-01T08:00:03.000+08:00| d0| 1| 3| |1970-01-01T08:00:00.000+08:00| d0| 3| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 5| |1970-01-01T08:00:01.000+08:00| d0| 5| 5| +-----------------------------+------+----+----------+
nth_valuenth_value(value, n) [IGNORE NULLS]IoTDB> SELECT *, nth_value(flow, 2) OVER w as nth_values FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-----------------------------+------+----+----------+ | time|device|flow|nth_values| +-----------------------------+------+----+----------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 4| |1970-01-01T08:00:05.000+08:00| d1| 4| 4| |1970-01-01T08:00:03.000+08:00| d0| 1| 3| |1970-01-01T08:00:00.000+08:00| d0| 3| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 3| |1970-01-01T08:00:01.000+08:00| d0| 5| 5| +-----------------------------+------+----+----------+
lead(value[, offset[, default]]) [IGNORE NULLS]IoTDB> SELECT *, lead(flow) OVER w as lead FROM device_flow WINDOW w AS(PARTITION BY device ORDER BY time); +-----------------------------+------+----+----+ | time|device|flow|lead| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 4| |1970-01-01T08:00:05.000+08:00| d1| 4|null| |1970-01-01T08:00:00.000+08:00| d0| 3| 5| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 1| |1970-01-01T08:00:03.000+08:00| d0| 1|null| +-----------------------------+------+----+----+
lag(value[, offset[, default]]) [IGNORE NULLS]IoTDB> SELECT *, lag(flow) OVER w as lag FROM device_flow WINDOW w AS(PARTITION BY device ORDER BY device); +-----------------------------+------+----+----+ | time|device|flow| lag| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2|null| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:00.000+08:00| d0| 3|null| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| |1970-01-01T08:00:02.000+08:00| d0| 3| 5| |1970-01-01T08:00:03.000+08:00| d0| 1| 3| +-----------------------------+------+----+----+
rank()IoTDB> SELECT *, rank() OVER w as rank FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+----+ | time|device|flow|rank| +-----------------------------+------+----+----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 2| |1970-01-01T08:00:02.000+08:00| d0| 3| 2| |1970-01-01T08:00:01.000+08:00| d0| 5| 4| +-----------------------------+------+----+----+
dense_rank()IoTDB> SELECT *, dense_rank() OVER w as dense_rank FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+----------+ | time|device|flow|dense_rank| +-----------------------------+------+----+----------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 2| |1970-01-01T08:00:02.000+08:00| d0| 3| 2| |1970-01-01T08:00:01.000+08:00| d0| 5| 3| +-----------------------------+------+----+----------+
row_number()IoTDB> SELECT *, row_number() OVER w as row_number FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+----------+ | time|device|flow|row_number| +-----------------------------+------+----+----------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 2| |1970-01-01T08:00:02.000+08:00| d0| 3| 3| |1970-01-01T08:00:01.000+08:00| d0| 5| 4| +-----------------------------+------+----+----------+
percent_rank()IoTDB> SELECT *, percent_rank() OVER w as percent_rank FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+------------------+ | time|device|flow| percent_rank| +-----------------------------+------+----+------------------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 0.0| |1970-01-01T08:00:05.000+08:00| d1| 4| 1.0| |1970-01-01T08:00:03.000+08:00| d0| 1| 0.0| |1970-01-01T08:00:00.000+08:00| d0| 3|0.3333333333333333| |1970-01-01T08:00:02.000+08:00| d0| 3|0.3333333333333333| |1970-01-01T08:00:01.000+08:00| d0| 5| 1.0| +-----------------------------+------+----+------------------+
cume_distIoTDB> SELECT *, cume_dist() OVER w as cume_dist FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+---------+ | time|device|flow|cume_dist| +-----------------------------+------+----+---------+ |1970-01-01T08:00:04.000+08:00| d1| 2| 0.5| |1970-01-01T08:00:05.000+08:00| d1| 4| 1.0| |1970-01-01T08:00:03.000+08:00| d0| 1| 0.25| |1970-01-01T08:00:00.000+08:00| d0| 3| 0.75| |1970-01-01T08:00:02.000+08:00| d0| 3| 0.75| |1970-01-01T08:00:01.000+08:00| d0| 5| 1.0| +-----------------------------+------+----+---------+
ntileIoTDB> SELECT *, ntile(2) OVER w as ntile FROM device_flow WINDOW w AS (PARTITION BY device ORDER BY flow); +-----------------------------+------+----+-----+ | time|device|flow|ntile| +-----------------------------+------+----+-----+ |1970-01-01T08:00:04.000+08:00| d1| 2| 1| |1970-01-01T08:00:05.000+08:00| d1| 4| 2| |1970-01-01T08:00:03.000+08:00| d0| 1| 1| |1970-01-01T08:00:00.000+08:00| d0| 3| 1| |1970-01-01T08:00:02.000+08:00| d0| 3| 2| |1970-01-01T08:00:01.000+08:00| d0| 5| 2| +-----------------------------+------+----+-----+
For each row of each device, calculate the difference from the previous row:
SELECT *, measurement - lag(measurement) OVER (PARTITION BY device ORDER BY time) FROM data WHERE timeCondition;
For each row of each device, calculate the difference from the next row:
SELECT *, measurement - lead(measurement) OVER (PARTITION BY device ORDER BY time) FROM data WHERE timeCondition;
For each row of a single device, calculate the difference from the previous row (same for the next row):
SELECT *, measurement - lag(measurement) OVER (ORDER BY time) FROM data where device='d1' WHERE timeCondition;
Use rank to get the sequence number, then retain the desired order in the outer query.
(Note: The execution order of window functions is after the HAVING clause, so a subquery is needed here)
SELECT * FROM( SELECT *, rank() OVER (PARTITION BY device ORDER BY time DESC) FROM data WHERE timeCondition ) WHERE rank <= 3;
In addition to sorting by time, you can also sort by the value of the measurement point:
SELECT * FROM( SELECT *, rank() OVER (PARTITION BY device ORDER BY measurement DESC) FROM data WHERE timeCondition ) WHERE rank <= 3;
This SQL is used to remove consecutive identical values in the input sequence, which can be achieved with lead + subquery:
SELECT time, device, measurement FROM( SELECT time, device, measurement, LEAD(measurement) OVER (PARTITION BY device ORDER BY time) AS next FROM data WHERE timeCondition ) WHERE measurement != next OR next IS NULL;