Data Import

1. IoTDB Data Import

IoTDB currently supports importing data in CSV, SQL, and TsFile (IoTDB's underlying open-time series file format) into the database. The specific functionalities are as follows:

2. import-data Scripts

  • Supported formats: CSV、SQL

2.1 Command

# Unix/OS X
>tools/import-data.sh -h <ip> -p <port> -u <username> -pw <password> -s <xxx.csv/sql> [-fd <./failedDirectory> -aligned <true/false> -batch <int> -tp <ms/ns/us> -typeInfer <boolean=text,float=double...> -lpf <int>]

# Windows
>tools\import-data.bat -h <ip> -p <port> -u <username> -pw <password> -s <xxx.csv/sql> [-fd <./failedDirectory> -aligned <true/false> -batch <int> -tp <ms/ns/us> -typeInfer <boolean=text,float=double...> -lpf <int>]

2.2 Parameter Introduction

ParameterDefinitionRequiredDefault
-hHostnameNo127.0.0.1
-pPortNo6667
-uUsernameNoroot
-pwPasswordNoroot
-sSpecify the data to be imported, here you can specify files or folders. If a folder is specified, all files with suffixes of csv or sql in the folder will be batch imported (In V1.3.2, the parameter is -f)Yes
-fdSpecify the directory for storing failed SQL files. If this parameter is not specified, failed files will be saved in the source data directory. Note: For unsupported SQL, illegal SQL, and failed SQL, they will be put into the failed directory under the failed file (default is the file name with .failed suffix)NoThe source filename with .failed suffix
-alignedSpecify whether to use the aligned interface, options are true or false. Note: This parameter is only effective when importing csv files.Nofalse
-batchUsed to specify the number of data points per batch (minimum value is 1, maximum value is Integer.MAX_VALUE). If the program reports the error org.apache.thrift.transport.TTransportException: Frame size larger than protect max size, you can appropriately reduce this parameter.No100000
-tpSpecify the time precision, options include ms (milliseconds), ns (nanoseconds), us (microseconds)Noms
-lpfSpecify the number of data lines written per failed file (In V1.3.2, the parameter is -linesPerFailedFile)No10000
-typeInferUsed to specify type inference rules, such as <srcTsDataType1=dstTsDataType1,srcTsDataType2=dstTsDataType2,...>. Note: Used to specify type inference rules. srcTsDataType includes boolean, int, long, float, double, NaN. dstTsDataType includes boolean, int, long, float, double, text. When srcTsDataType is boolean, dstTsDataType can only be boolean or text. When srcTsDataType is NaN, dstTsDataType can only be float, double, or text. When srcTsDataType is a numerical type, the precision of dstTsDataType needs to be higher than srcTsDataType. For example: -typeInfer boolean=text,float=doubleNo

2.3 Running Example

  • Import the dump0_0.sql data in the current data directory to the local IoTDB database.
# Unix/OS X
>tools/import-data.sh -s ./data/dump0_0.sql
# Windows
>tools/import-data.bat -s ./data/dump0_0.sql
  • Import all data in the current data directory in an aligned manner to the local IoTDB database.
# Unix/OS X
>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true
# Windows
>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true
  • Import the dump0_0.csv data in the current data directory to the local IoTDB database.
# Unix/OS X
>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/
# Windows
>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/
  • Import the dump0_0.csv data in the current data directory in an aligned manner, batch import 100000 lines to the IoTDB database on the host with IP 192.168.100.1, record failures in the current failed directory, with a maximum of 1000 lines per file.
# Unix/OS X
>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000
# Windows
>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000

2.4 Import Notes

  1. CSV Import Specifications
  • Special Character Escaping Rules: If a text-type field contains special characters (e.g., commas ,), they must be escaped using a backslash ().
  • Supported Time Formats: yyyy-MM-dd‘T’HH:mm:ss, yyyy-MM-dd HH:mm:ss, or yyyy-MM-dd‘T’HH:mm:ss.SSSZ.
  • Timestamp Column Requirement: The timestamp column must be the first column in the data file.
  1. CSV File Example
  • Time Alignment
-- Headers without data types
Time,root.test.t1.str,root.test.t2.str,root.test.t2.var
1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100
1970-01-01T08:00:00.002+08:00,"123",,

-- Headers with data types (Text-type data supports both quoted and unquoted formats)
Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.var(INT32)
1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100
1970-01-01T08:00:00.002+08:00,123,hello world,123
1970-01-01T08:00:00.003+08:00,"123",,
1970-01-01T08:00:00.004+08:00,123,,12
  • Device Alignment
-- Headers without data types
Time,Device,str,var
1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world",
1970-01-01T08:00:00.002+08:00,root.test.t1,"123",
1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100

-- Headers with data types (Text-type data supports both quoted and unquoted formats)
Time,Device,str(TEXT),var(INT32)
1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world",
1970-01-01T08:00:00.002+08:00,root.test.t1,"123",
1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100
1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123

3. load-tsfile Script

  • Supported formats: TsFile

3.1 Command

# Unix/OS X
>tools/load-tsfile.sh -h <ip> -p <port> -u <username> -pw <password> -s <source> -os <on_success> [-sd <success_dir>] -of <on_fail> [-fd <fail_dir>] [-tn <thread_num>] 

# Windows
>tools\load-tsfile.bat -h <ip> -p <port> -u <username> -pw <password>  -s <source> -os <on_success> [-sd <success_dir>] -of <on_fail> [-fd <fail_dir>] [-tn <thread_num>] 

3.2 Parameter Introduction

ParameterDescriptionRequiredDefault
-hHostnameNoroot
-pPortNoroot
-uUsernameNo127.0.0.1
-pwPasswordNo6667
-sThe local directory path of the script file (folder) to be loadedYes
-osnone: Do not delete
mv: Move successful files to the target folder
cp: Hard link (copy) successful files to the target folder
delete: Delete
Yes
-sdWhen --on_success is mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name.When --on_success is mv or cp, it is required to fill in Yes${EXEC_DIR}/success
-ofnone: Skip
mv: Move failed files to the target folder
cp: Hard link (copy) failed files to the target folder
delete: Delete
Yes
-fdWhen --on_fail is specified as mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name.When --on_fail is specified as mv or cp, it is required to fill in${EXEC_DIR}/fail
-tnMaximum number of parallel threadsYes8

3.3 Running Examples

# Unix/OS X
> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8
> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 

# Windows
> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 
> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8 

4. TsFile Active Listening & Loading Feature

The TsFile Active Listening & Loading Feature can actively monitor TsFile file changes in the specified target path (configured by the user) and automatically synchronize TsFile files from the target path to the specified reception path (configured by the user). Through this feature, IoTDB can automatically detect and load these files without the need for any additional manual loading operations. This automated process not only simplifies the user's operational steps but also reduces potential errors that may occur during the operation, effectively reducing the complexity for users during the usage process.

4.1 Configuration Parameters

You can enable the TsFile Active Listening & Loading Feature by finding the following parameters in the configuration file template iotdb-system.properties.template and adding them to the IoTDB configuration file iotdb-system.properties. The complete configuration is as follows:

Configuration ParameterDescriptionValue RangeRequiredDefault ValueLoading Method
load_active_listening_enableWhether to enable the DataNode's active listening and loading of tsfile functionality (default is enabled).Boolean: true,falseOptionaltrueHot Loading
load_active_listening_dirsThe directories to be listened to (automatically includes subdirectories of the directory), if there are multiple, separate with “,”. The default directory is ext/load/pending (supports hot loading).String: one or more file directoriesOptionalext/load/pendingHot Loading
load_active_listening_fail_dirThe directory to which files are transferred after the execution of loading tsfile files fails, only one directory can be configured.String: one file directoryOptionalext/load/failedHot Loading
load_active_listening_max_thread_numThe maximum number of threads to perform loading tsfile tasks simultaneously. The default value when the parameter is commented out is max(1, CPU core count / 2). When the user sets a value not in the range [1, CPU core count / 2], it will be set to the default value (1, CPU core count / 2).Long: [1, Long.MAX_VALUE]Optionalmax(1, CPU core count / 2)Effective after restart
load_active_listening_check_interval_secondsActive listening polling interval in seconds. The function of actively listening to tsfile is achieved by polling the folder. This configuration specifies the time interval between two checks of load_active_listening_dirs, and the next check will be executed after load_active_listening_check_interval_seconds seconds of each check. When the user sets the polling interval to less than 1, it will be set to the default value of 5 seconds.Long: [1, Long.MAX_VALUE]Optional5Effective after restart

4.2 Precautions

  1. If there is a mods file in the files to be loaded, the mods file should be moved to the listening directory first, and then the tsfile files should be moved, with the mods file and the corresponding tsfile file in the same directory. This prevents the loading of tsfile files without the corresponding mods files.
FUNCTION moveFilesToListeningDirectory(sourceDirectory, listeningDirectory)
    // Move mods files
    modsFiles = searchFiles(sourceDirectory, "*mods*")
    IF modsFiles IS NOT EMPTY
        FOR EACH file IN modsFiles
            MOVE(file, listeningDirectory)
        END FOR
    END IF

    // Move tsfile files
    tsfileFiles = searchFiles(sourceDirectory, "*tsfile*")
    IF tsfileFiles IS NOT EMPTY
        FOR EACH file IN tsfileFiles
            MOVE(file, listeningDirectory)
        END FOR
    END IF
END FUNCTION

FUNCTION searchFiles(directory, pattern)
    matchedFiles = []
    FOR EACH file IN directory.files
        IF file.name MATCHES pattern
            APPEND file TO matchedFiles
        END IF
    END FOR
    RETURN matchedFiles
END FUNCTION

FUNCTION MOVE(sourceFile, targetDirectory)
    // Implement the logic of moving files from sourceFile to targetDirectory
END FUNCTION
  1. Prohibit setting the receiver directory of Pipe, the data directory for storing data, etc., as the listening directory.

  2. Prohibit load_active_listening_fail_dir from having the same directory as load_active_listening_dirs, or each other's nesting.

  3. Ensure that the load_active_listening_dirs directory has sufficient permissions. After the load is successful, the files will be deleted. If there is no delete permission, it will lead to repeated loading.