FILL Clause

1. Function Introduction

During data queries, you may encounter scenarios where certain columns have missing data in some rows, resulting in NULL values in the result set. These NULL values can hinder data visualization and analysis. To address this, IoTDB provides the FILL clause to populate these NULL values.

  • If the query includes an ORDER BY clause, the FILL clause is executed before ORDER BY.
  • If a GAPFILL (e.g., date_bin_gapfill function) operation exists, the FILL clause is executed after GAPFILL.

2. Syntax Overview

fillClause
    : FILL METHOD fillMethod
    ;

fillMethod
    : LINEAR timeColumnClause? fillGroupClause?                           #linearFill
    | PREVIOUS timeBoundClause? timeColumnClause? fillGroupClause?        #previousFill
    | CONSTANT literalExpression                                          #valueFill
    ;

timeColumnClause
    : TIME_COLUMN INTEGER_VALUE
    ;

fillGroupClause
    : FILL_GROUP INTEGER_VALUE (',' INTEGER_VALUE)*
    ;

timeBoundClause
    : TIME_BOUND duration=timeDuration
    ;

timeDuration
    : (INTEGER_VALUE intervalField)+
    ;
intervalField
    : YEAR | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | MILLISECOND | MICROSECOND | NANOSECOND
    ;

2.1 ### Filling Methods

IoTDB supports the following three methods to fill NULL values:

  1. PREVIOUS Fill: Uses the most recent non-NULL value from the same column to fill NULL values.
  2. LINEAR Fill: Applies linear interpolation using the nearest previous and next non-NULL values in the same column.
  3. CONSTANT Fill: Fills NULL values with a specified constant.

Only one filling method can be specified, and it applies to all columns in the result set.

2.2 Supported Data Types for Filling Methods

Data TypePreviousLinearConstant
boolean-
int32
int64
float
double
text-
string-
blob-
timestamp
date

Note: Columns with data types not supporting the specified filling method will remain unchanged without errors.

3. Sample Dataset

The Example Data pagepage provides SQL statements to construct table schemas and insert data. By downloading and executing these statements in the IoTDB CLI, you can import the data into IoTDB. This data can be used to test and run the example SQL queries included in this documentation, allowing you to reproduce the described results.

3.1 PREVIOUS Fill

PREVIOUS FILL fills NULL values with the most recent non-NULL value in the same column.

3.1.1 Parameters

  • TIME_BOUND (optional): Defines a forward-looking time threshold. If the time difference between the current NULL value and the previous non-NULL value exceeds this threshold, the value will not be filled. By default, the system uses the first TIMESTAMP column in the query result to determine the threshold.
    • Format: A time interval specified with integer values and units, e.g., 1d1h (1 day and 1 hour).
  • TIME_COLUMN (optional): Allows specifying the TIMESTAMP column used to determine the time threshold. The column is specified using its positional index (starting from 1) in the original table.

3.1.2 Examples

  • Without FILL Clause:
SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
    AND plant_id='1001' and device_id='101';

Results:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|       null|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  null|
|2024-11-27T16:40:00.000+08:00|       85.0|  null|
|2024-11-27T16:41:00.000+08:00|       85.0|  null|
|2024-11-27T16:42:00.000+08:00|       null| false|
|2024-11-27T16:43:00.000+08:00|       null| false|
|2024-11-27T16:44:00.000+08:00|       null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.088s
  • Using PREVIOUS Fill:
SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
    AND plant_id='1001' and device_id='101'
  FILL METHOD PREVIOUS;

Results:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|       null|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  true|
|2024-11-27T16:40:00.000+08:00|       85.0|  true|
|2024-11-27T16:41:00.000+08:00|       85.0|  true|
|2024-11-27T16:42:00.000+08:00|       85.0| false|
|2024-11-27T16:43:00.000+08:00|       85.0| false|
|2024-11-27T16:44:00.000+08:00|       85.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.091s
  • Using PREVIOUS Fill with a Specified Time Threshold:
# Do not specify a time column
SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
      AND plant_id='1001' and device_id='101'
  FILL METHOD PREVIOUS TIME_BOUND 1m;

# Manually specify the time column
SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
      AND plant_id='1001' and device_id='101'
  FILL METHOD PREVIOUS        1m TIME_COLUMN 1;

Results:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|       null|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  true|
|2024-11-27T16:40:00.000+08:00|       85.0|  null|
|2024-11-27T16:41:00.000+08:00|       85.0|  null|
|2024-11-27T16:42:00.000+08:00|       85.0| false|
|2024-11-27T16:43:00.000+08:00|       null| false|
|2024-11-27T16:44:00.000+08:00|       null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.075s

3.2 LINEAR Fill

LINEAR Fill fills NULL values using linear interpolation based on the nearest previous and next non-NULL values in the same column.

3.2.1 Linear Fill Rules

  1. If all previous or all subsequent values are NULL, no filling is performed.
  2. Columns with data types such as boolean, string, blob, or text are not filled, and no error is returned.
  3. If no auxiliary time column is specified, the first TIMESTAMP-type column in the SELECT clause is used by default for interpolation. If no TIMESTAMP column exists, an error will be returned.

3.2.2 Parameters

  • TIME_COLUMN (optional): Specifies the TIMESTAMP column to be used as an auxiliary column for linear interpolation. The column is identified by its positional index (starting from 1) in the original table.

Note: The auxiliary column used for linear interpolation is not required to be the time column. However, the auxiliary column must be sorted in ascending or descending order for meaningful interpolation. If another column is specified, the user must ensure the result set is ordered correctly.

3.2.3 Examples

SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
      AND plant_id='1001' and device_id='101'
  FILL METHOD LINEAR;

Result:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|       null|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  null|
|2024-11-27T16:40:00.000+08:00|       85.0|  null|
|2024-11-27T16:41:00.000+08:00|       85.0|  null|
|2024-11-27T16:42:00.000+08:00|       null| false|
|2024-11-27T16:43:00.000+08:00|       null| false|
|2024-11-27T16:44:00.000+08:00|       null| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.053s

3.3 ### CONSTANT Fill

CONSTANT Fill fills NULL values with a specified constant value.

3.3.1 Constant Fill Rules

  1. If the data type of the constant does not match the column's data type, IoTDB does not fill the result set and no error is returned.
  2. If the constant value exceeds the column's allowable range, IoTDB does not fill the result set and no error is returned.

3.3.2 Examples

  • Using a FLOAT constant:
SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
      AND plant_id='1001' and device_id='101'
  FILL METHOD CONSTANT 80.0;

Results:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|       80.0|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  true|
|2024-11-27T16:40:00.000+08:00|       85.0|  true|
|2024-11-27T16:41:00.000+08:00|       85.0|  true|
|2024-11-27T16:42:00.000+08:00|       80.0| false|
|2024-11-27T16:43:00.000+08:00|       80.0| false|
|2024-11-27T16:44:00.000+08:00|       80.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.242s

W- Using a BOOLEAN constant:

SELECT time, temperature, status 
  FROM table1 
  WHERE time >= 2024-11-27 00:00:00  and time <= 2024-11-29 00:00:00
      AND plant_id='1001' and device_id='101'
  FILL METHOD CONSTANT true;

Results:

+-----------------------------+-----------+------+
|                         time|temperature|status|
+-----------------------------+-----------+------+
|2024-11-27T16:38:00.000+08:00|        1.0|  true|
|2024-11-27T16:39:00.000+08:00|       85.0|  true|
|2024-11-27T16:40:00.000+08:00|       85.0|  true|
|2024-11-27T16:41:00.000+08:00|       85.0|  true|
|2024-11-27T16:42:00.000+08:00|        1.0| false|
|2024-11-27T16:43:00.000+08:00|        1.0| false|
|2024-11-27T16:44:00.000+08:00|        1.0| false|
+-----------------------------+-----------+------+
Total line number = 7
It costs 0.073s

4. Advanced Usage

When using the PREVIOUS or LINEAR FILL methods, the FILL_GROUP parameter allows filling within specific groups without being influenced by other groups.

Examples

  • Filling Missing Values Within device_id

The following query demonstrates how to fill missing values for each device_id group independently, without using values from other devices:

SELECT date_bin(1h, time) AS hour_time,  plant_id, device_id, avg(temperature) AS avg_temp
  FROM table1
  WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
  group by 1, plant_id, device_id;

Results:

+-----------------------------+--------+---------+--------+
|                    hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T09:00:00.000+08:00|    3001|      100|    null|
|2024-11-28T10:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T11:00:00.000+08:00|    3001|      100|    88.0|
|2024-11-29T10:00:00.000+08:00|    3001|      101|    85.0|
|2024-11-29T11:00:00.000+08:00|    3002|      100|    null|
|2024-11-29T18:00:00.000+08:00|    3002|      100|    90.0|
|2024-11-30T09:00:00.000+08:00|    3002|      101|    90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.110s
  • Without Specifying FILL_GROUP

If the FILL_GROUP parameter is not specified, missing values in device_id = 100 will be filled using values from device_id = 101:

SELECT date_bin(1h, time) AS hour_time,  plant_id, device_id, avg(temperature) AS avg_temp
  FROM table1
  WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
  group by 1, plant_id, device_id
  FILL METHOD PREVIOUS;

Results:

+-----------------------------+--------+---------+--------+
|                    hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T09:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T10:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T11:00:00.000+08:00|    3001|      100|    88.0|
|2024-11-29T10:00:00.000+08:00|    3001|      101|    85.0|
|2024-11-29T11:00:00.000+08:00|    3002|      100|    85.0|
|2024-11-29T18:00:00.000+08:00|    3002|      100|    90.0|
|2024-11-30T09:00:00.000+08:00|    3002|      101|    90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.066s
  • Specifying FILL_GROUP for Grouped Filling

By specifying FILL_GROUP 2, the filling is restricted to groups based on the second column (device_id). As a result, missing values in device_id = 100 will not be filled using values from device_id = 101:

SELECT date_bin(1h, time) AS hour_time,  plant_id, device_id, avg(temperature) AS avg_temp
  FROM table1
  WHERE time >= 2024-11-28 08:00:00 AND time < 2024-11-30 14:30:00
  group by 1, plant_id, device_id
  FILL METHOD PREVIOUS FILL_GROUP 2;

Results:

+-----------------------------+--------+---------+--------+
|                    hour_time|plant_id|device_id|avg_temp|
+-----------------------------+--------+---------+--------+
|2024-11-28T08:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T09:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T10:00:00.000+08:00|    3001|      100|    85.0|
|2024-11-28T11:00:00.000+08:00|    3001|      100|    88.0|
|2024-11-29T10:00:00.000+08:00|    3001|      101|    85.0|
|2024-11-29T11:00:00.000+08:00|    3002|      100|    null|
|2024-11-29T18:00:00.000+08:00|    3002|      100|    90.0|
|2024-11-30T09:00:00.000+08:00|    3002|      101|    90.0|
+-----------------------------+--------+---------+--------+
Total line number = 8
It costs 0.089s

5. Special Notes

When using LINEAR or PREVIOUS FILL methods, if the auxiliary time column (used to determine filling logic) contains NULL values, IoTDB follows these rules:

  • Rows with NULL values in the auxiliary column will not be filled.
  • These rows are also excluded from the filling logic calculations.

Example of PREVIOUS Fill

  • Query original data:
SELECT time, plant_id, device_id, humidity, arrival_time
  FROM table1 
  WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00
      AND plant_id='1001' and device_id='101';

Results:

+-----------------------------+--------+---------+--------+-----------------------------+
|                         time|plant_id|device_id|humidity|                 arrival_time|
+-----------------------------+--------+---------+--------+-----------------------------+
|2024-11-27T16:38:00.000+08:00|    1001|      101|    35.1|2024-11-27T16:37:01.000+08:00|
|2024-11-27T16:39:00.000+08:00|    1001|      101|    35.3|                         null|
|2024-11-27T16:40:00.000+08:00|    1001|      101|    null|2024-11-27T16:37:03.000+08:00|
|2024-11-27T16:41:00.000+08:00|    1001|      101|    null|2024-11-27T16:37:04.000+08:00|
|2024-11-27T16:42:00.000+08:00|    1001|      101|    35.2|                         null|
|2024-11-27T16:43:00.000+08:00|    1001|      101|    null|                         null|
|2024-11-27T16:44:00.000+08:00|    1001|      101|    null|2024-11-27T16:37:08.000+08:00|
+-----------------------------+--------+---------+--------+-----------------------------+
Total line number = 7
It costs 0.119s
  • Using arrival_time as the auxiliary column with a time interval (TIME_BOUND) of 2 seconds
SELECT time, plant_id, device_id, humidity, arrival_time
  FROM table1
  WHERE time >= 2024-11-26 16:37:00 and time <= 2024-11-28 08:00:00
    AND plant_id='1001' and device_id='101'
  FILL METHOD PREVIOUS TIME_BOUND 2s TIME_COLUMN 5;

Results:

+-----------------------------+--------+---------+--------+-----------------------------+
|                         time|plant_id|device_id|humidity|                 arrival_time|
+-----------------------------+--------+---------+--------+-----------------------------+
|2024-11-27T16:38:00.000+08:00|    1001|      101|    35.1|2024-11-27T16:37:01.000+08:00|
|2024-11-27T16:39:00.000+08:00|    1001|      101|    35.3|                         null|
|2024-11-27T16:40:00.000+08:00|    1001|      101|    35.1|2024-11-27T16:37:03.000+08:00|
|2024-11-27T16:41:00.000+08:00|    1001|      101|    null|2024-11-27T16:37:04.000+08:00|
|2024-11-27T16:42:00.000+08:00|    1001|      101|    35.2|                         null|
|2024-11-27T16:43:00.000+08:00|    1001|      101|    null|                         null|
|2024-11-27T16:44:00.000+08:00|    1001|      101|    null|2024-11-27T16:37:08.000+08:00|
+-----------------------------+--------+---------+--------+-----------------------------+
Total line number = 7
It costs 0.049s

Filling Details

  1. For humidity at 16:39, 16:42, and 16:43:
    1. Since the auxiliary column arrival_time is NULL, no filling is performed.
  2. For humidity at 16:40:
    1. The auxiliary column arrival_time is not NULL and has a value of 1970-01-01T08:00:00.003+08:00.
    2. The time difference from the previous non-NULL value (1970-01-01T08:00:00.001+08:00) is less than 2 seconds (TIME_BOUND)
    3. So the value 35.1 from the first row is used for filling.
  3. For humidity at 16:41:
    1. Although the auxiliary column arrival_time is not NULL, the time difference from the previous non-NULL value exceeds 2 seconds, so no filling is performed.
  4. For humidity at 16:44:
    1. Similarly, the time difference exceeds 2 seconds, so no filling is performed.