During data queries, you may encounter scenarios where certain columns have missing data in some rows, resulting in NULL values in the result set. These NULL values can hinder data visualization and analysis. To address this, IoTDB provides the FILL clause to populate these NULL values.
ORDER BY clause, the FILL clause is executed before ORDER BY.GAPFILL (e.g., date_bin_gapfill function) operation exists, the FILL clause is executed after GAPFILL.fillClause : FILL METHOD fillMethod ; fillMethod : LINEAR timeColumnClause? fillGroupClause? #linearFill | PREVIOUS timeBoundClause? timeColumnClause? fillGroupClause? #previousFill | CONSTANT literalExpression #valueFill ; timeColumnClause : TIME_COLUMN INTEGER_VALUE ; fillGroupClause : FILL_GROUP INTEGER_VALUE (',' INTEGER_VALUE)* ; timeBoundClause : TIME_BOUND duration=timeDuration ; timeDuration : (INTEGER_VALUE intervalField)+ ; intervalField : YEAR | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | MILLISECOND | MICROSECOND | NANOSECOND ;
IoTDB supports the following three methods to fill NULL values:
Only one filling method can be specified, and it applies to all columns in the result set.
| Data Type | Previous | Linear | Constant |
|---|---|---|---|
| boolean | √ | - | √ |
| int32 | √ | √ | √ |
| int64 | √ | √ | √ |
| float | √ | √ | √ |
| double | √ | √ | √ |
| text | √ | - | √ |
| string | √ | - | √ |
| blob | √ | - | √ |
| timestamp | √ | √ | √ |
| date | √ | √ | √ |
Note: Columns with data types not supporting the specified filling method will remain unchanged without errors.
The Example Data pagepage provides SQL statements to construct table schemas and insert data. By downloading and executing these statements in the IoTDB CLI, you can import the data into IoTDB. This data can be used to test and run the example SQL queries included in this documentation, allowing you to reproduce the described results.
PREVIOUS FILL fills NULL values with the most recent non-NULL value in the same column.
TIMESTAMP column in the query result to determine the threshold.1d1h (1 day and 1 hour).TIMESTAMP column used to determine the time threshold. The column is specified using its positional index (starting from 1) in the original table.SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101';
Results:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| null| true| |2024-11-27T16:39:00.000+08:00| 85.0| null| |2024-11-27T16:40:00.000+08:00| 85.0| null| |2024-11-27T16:41:00.000+08:00| 85.0| null| |2024-11-27T16:42:00.000+08:00| null| false| |2024-11-27T16:43:00.000+08:00| null| false| |2024-11-27T16:44:00.000+08:00| null| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.088s
PREVIOUS Fill:SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD PREVIOUS;
Results:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| null| true| |2024-11-27T16:39:00.000+08:00| 85.0| true| |2024-11-27T16:40:00.000+08:00| 85.0| true| |2024-11-27T16:41:00.000+08:00| 85.0| true| |2024-11-27T16:42:00.000+08:00| 85.0| false| |2024-11-27T16:43:00.000+08:00| 85.0| false| |2024-11-27T16:44:00.000+08:00| 85.0| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.091s
PREVIOUS Fill with a Specified Time Threshold:# Do not specify a time column SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD PREVIOUS TIME_BOUND 1m; # Manually specify the time column SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD PREVIOUS 1m TIME_COLUMN 1;
Results:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| null| true| |2024-11-27T16:39:00.000+08:00| 85.0| true| |2024-11-27T16:40:00.000+08:00| 85.0| null| |2024-11-27T16:41:00.000+08:00| 85.0| null| |2024-11-27T16:42:00.000+08:00| 85.0| false| |2024-11-27T16:43:00.000+08:00| null| false| |2024-11-27T16:44:00.000+08:00| null| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.075s
LINEAR Fill fills NULL values using linear interpolation based on the nearest previous and next non-NULL values in the same column.
boolean, string, blob, or text are not filled, and no error is returned.TIMESTAMP-type column in the SELECT clause is used by default for interpolation. If no TIMESTAMP column exists, an error will be returned.TIMESTAMP column to be used as an auxiliary column for linear interpolation. The column is identified by its positional index (starting from 1) in the original table.Note: The auxiliary column used for linear interpolation is not required to be the time column. However, the auxiliary column must be sorted in ascending or descending order for meaningful interpolation. If another column is specified, the user must ensure the result set is ordered correctly.
SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD LINEAR;
Result:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| null| true| |2024-11-27T16:39:00.000+08:00| 85.0| null| |2024-11-27T16:40:00.000+08:00| 85.0| null| |2024-11-27T16:41:00.000+08:00| 85.0| null| |2024-11-27T16:42:00.000+08:00| null| false| |2024-11-27T16:43:00.000+08:00| null| false| |2024-11-27T16:44:00.000+08:00| null| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.053s
CONSTANT Fill fills NULL values with a specified constant value.
FLOAT constant:SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD CONSTANT 80.0;
Results:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| 80.0| true| |2024-11-27T16:39:00.000+08:00| 85.0| true| |2024-11-27T16:40:00.000+08:00| 85.0| true| |2024-11-27T16:41:00.000+08:00| 85.0| true| |2024-11-27T16:42:00.000+08:00| 80.0| false| |2024-11-27T16:43:00.000+08:00| 80.0| false| |2024-11-27T16:44:00.000+08:00| 80.0| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.242s
W- Using a BOOLEAN constant:
SELECT time, temperature, status FROM table1 WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00 AND plant_id='1001' and device_id='101' FILL METHOD CONSTANT true;
Results:
+-----------------------------+-----------+------+ | time|temperature|status| +-----------------------------+-----------+------+ |2024-11-27T16:38:00.000+08:00| 1.0| true| |2024-11-27T16:39:00.000+08:00| 85.0| true| |2024-11-27T16:40:00.000+08:00| 85.0| true| |2024-11-27T16:41:00.000+08:00| 85.0| true| |2024-11-27T16:42:00.000+08:00| 1.0| false| |2024-11-27T16:43:00.000+08:00| 1.0| false| |2024-11-27T16:44:00.000+08:00| 1.0| false| +-----------------------------+-----------+------+ Total line number = 7 It costs 0.073s
When using the PREVIOUS or LINEAR FILL methods, the FILL_GROUP parameter allows filling within specific groups without being influenced by other groups.
device_idThe following query demonstrates how to fill missing values for each device_id group independently, without using values from other devices:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp FROM table1 WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00 group by 1, plant_id, device_id;
Results:
+-----------------------------+--------+---------+--------+ | hour_time|plant_id|device_id|avg_temp| +-----------------------------+--------+---------+--------+ |2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T09:00:00.000+08:00| 3001| 100| null| |2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0| |2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0| |2024-11-29T11:00:00.000+08:00| 3002| 100| null| |2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0| |2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0| +-----------------------------+--------+---------+--------+ Total line number = 8 It costs 0.110s
FILL_GROUPIf the FILL_GROUP parameter is not specified, missing values in device_id = 100 will be filled using values from device_id = 101:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp FROM table1 WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00 group by 1, plant_id, device_id FILL METHOD PREVIOUS;
Results:
+-----------------------------+--------+---------+--------+ | hour_time|plant_id|device_id|avg_temp| +-----------------------------+--------+---------+--------+ |2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T09:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0| |2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0| |2024-11-29T11:00:00.000+08:00| 3002| 100| 85.0| |2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0| |2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0| +-----------------------------+--------+---------+--------+ Total line number = 8 It costs 0.066s
FILL_GROUP for Grouped FillingBy specifying FILL_GROUP 2, the filling is restricted to groups based on the second column (device_id). As a result, missing values in device_id = 100 will not be filled using values from device_id = 101:
SELECT date_bin(1h, time) AS hour_time, plant_id, device_id, avg(temperature) AS avg_temp FROM table1 WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00 group by 1, plant_id, device_id FILL METHOD PREVIOUS FILL_GROUP 2;
Results:
+-----------------------------+--------+---------+--------+ | hour_time|plant_id|device_id|avg_temp| +-----------------------------+--------+---------+--------+ |2024-11-28T08:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T09:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T10:00:00.000+08:00| 3001| 100| 85.0| |2024-11-28T11:00:00.000+08:00| 3001| 100| 88.0| |2024-11-29T10:00:00.000+08:00| 3001| 101| 85.0| |2024-11-29T11:00:00.000+08:00| 3002| 100| null| |2024-11-29T18:00:00.000+08:00| 3002| 100| 90.0| |2024-11-30T09:00:00.000+08:00| 3002| 101| 90.0| +-----------------------------+--------+---------+--------+ Total line number = 8 It costs 0.089s
When using LINEAR or PREVIOUS FILL methods, if the auxiliary time column (used to determine filling logic) contains NULL values, IoTDB follows these rules:
Example of PREVIOUS Fill
SELECT time, plant_id, device_id, humidity, arrival_time FROM table1 WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00 AND plant_id='1001' and device_id='101';
Results:
+-----------------------------+--------+---------+--------+-----------------------------+ | time|plant_id|device_id|humidity| arrival_time| +-----------------------------+--------+---------+--------+-----------------------------+ |2024-11-27T16:38:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:01.000+08:00| |2024-11-27T16:39:00.000+08:00| 1001| 101| 35.3| null| |2024-11-27T16:40:00.000+08:00| 1001| 101| null|2024-11-27T16:37:03.000+08:00| |2024-11-27T16:41:00.000+08:00| 1001| 101| null|2024-11-27T16:37:04.000+08:00| |2024-11-27T16:42:00.000+08:00| 1001| 101| 35.2| null| |2024-11-27T16:43:00.000+08:00| 1001| 101| null| null| |2024-11-27T16:44:00.000+08:00| 1001| 101| null|2024-11-27T16:37:08.000+08:00| +-----------------------------+--------+---------+--------+-----------------------------+ Total line number = 7 It costs 0.119s
arrival_time as the auxiliary column with a time interval (TIME_BOUND) of 2 secondsSELECT time, plant_id, device_id, humidity, arrival_time FROM table1 WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00 AND plant_id='1001' and device_id='101' FILL METHOD PREVIOUS TIME_BOUND 2s TIME_COLUMN 5;
Results:
+-----------------------------+--------+---------+--------+-----------------------------+ | time|plant_id|device_id|humidity| arrival_time| +-----------------------------+--------+---------+--------+-----------------------------+ |2024-11-27T16:38:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:01.000+08:00| |2024-11-27T16:39:00.000+08:00| 1001| 101| 35.3| null| |2024-11-27T16:40:00.000+08:00| 1001| 101| 35.1|2024-11-27T16:37:03.000+08:00| |2024-11-27T16:41:00.000+08:00| 1001| 101| null|2024-11-27T16:37:04.000+08:00| |2024-11-27T16:42:00.000+08:00| 1001| 101| 35.2| null| |2024-11-27T16:43:00.000+08:00| 1001| 101| null| null| |2024-11-27T16:44:00.000+08:00| 1001| 101| null|2024-11-27T16:37:08.000+08:00| +-----------------------------+--------+---------+--------+-----------------------------+ Total line number = 7 It costs 0.049s
Filling Details
humidity at 16:39, 16:42, and 16:43:arrival_time is NULL, no filling is performed.humidity at 16:40:arrival_time is not NULL and has a value of 1970-01-01T08:00:00.003+08:00.1970-01-01T08:00:00.001+08:00) is less than 2 seconds (TIME_BOUND)35.1 from the first row is used for filling.humidity at 16:41:arrival_time is not NULL, the time difference from the previous non-NULL value exceeds 2 seconds, so no filling is performed.humidity at 16:44: