For different scenarios, IoTDB provides users with a variety of operation methods for batch importing data. This chapter introduces the two most commonly used methods for importing in the form of CSV text and importing in the form of TsFile files.
The load external tsfile tool allows users to load tsfiles, delete a tsfile, or move a tsfile to target directory from the running Apache IoTDB instance. Alternatively, you can use scripts to load tsfiles into IoTDB, for more information.
The user sends specified commands to the Apache IoTDB system through the Cli tool or JDBC to use the tool.
The command to load tsfiles is load <path/dir> [sglevel=int][onSuccess=delete/none]
.
This command has two usages:
The first parameter indicates the path of the tsfile to be loaded. This command has 2 options: sglevel, onSuccess.
SGLEVEL option. If the database correspond to the tsfile does not exist, the user can set the level of database through the fourth parameter. By default, it uses the database level which is set in iotdb-datanode.properties
.
ONSUCCESS option. The default value is DELETE, which means the processing method of successfully loaded tsfiles, and DELETE means after the tsfile is successfully loaded, it will be deleted. NONE means after the tsfile is successfully loaded, it will be remained in the origin dir.
If the .resource
file corresponding to the file exists, it will be loaded into the data directory and engine of the Apache IoTDB. Otherwise, the corresponding .resource
file will be regenerated from the tsfile file.
Examples:
load '/Users/Desktop/data/1575028885956-101-0.tsfile'
load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1
load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete
load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1 onSuccess=delete
The first parameter indicates the path of the tsfile to be loaded. The options above also works for this command.
Examples:
load '/Users/Desktop/data'
load '/Users/Desktop/data' sglevel=1
load '/Users/Desktop/data' sglevel=1 onSuccess=delete
NOTICE: When $IOTDB_HOME$/conf/iotdb-datanode.properties
has enable_auto_create_schema=true
, it will automatically create metadata in TSFILE, otherwise it will not be created automatically.
Run rewrite-tsfile.bat if you are in a Windows environment, or rewrite-tsfile.sh if you are on Linux or Unix.
./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--onSuccess none/delete] -f File/Directory to be load, required -h IoTDB Host address, optional field, 127.0.0.1 by default -p IoTDB port, optional field, 6667 by default -u IoTDB user name, optional field, root by default -pw IoTDB password, optional field, root by default --sgLevel Sg level of loading Tsfile, optional field, default_storage_group_level in iotdb-common.properties by default --onSuccess Delete or remain origin TsFile after loading, optional field, none by default
Assuming that an IoTDB instance is running on server 192.168.0.101:6667, You want to load all TsFile files into this IoTDB instance from the path of the server where the Iotdb process is located.
First move to the folder $IOTDB_HOME/tools/
, open the command line, and execute
./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root
After waiting for the script execution to complete, you can check that the data in the IoTDB instance has been loaded correctly.
TsFile can help you export the result set in the format of TsFile file to the specified path by executing the sql, command line sql, and sql file.
# Unix/OS X > tools/export-tsfile.sh -h <ip> -p <port> -u <username> -pw <password> -td <directory> [-f <export filename> -q <query command> -s <sql file>] # Windows > tools\export-tsfile.bat -h <ip> -p <port> -u <username> -pw <password> -td <directory> [-f <export filename> -q <query command> -s <sql file>]
-h <host>
:-p <port>
:-u <username>
:-pw <password>
:-td <directory>
:-f <tsfile name>
:-q <query command>
:select * from root.** limit 100
-s <sql file>
:-t <timeout>
:In addition, if you do not use the -s
and -q
parameters, after the export script is started, you need to enter the query statement as prompted by the program, and different query results will be saved to different TsFile files.
# Unix/OS X > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ # or > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" # Or > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt # Or > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile # Or > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 # Windows > tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ # Or > tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" # Or > tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt # Or > tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile # Or > tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000
The CSV tool can help you import data in CSV format to IoTDB or export data from IoTDB to a CSV file.
# Unix/OS X > tools/export-csv.sh -h <ip> -p <port> -u <username> -pw <password> -td <directory> [-tf <time-format> -datatype <true/false> -q <query command> -s <sql file> -linesPerFile <int>] # Windows > tools\export-csv.bat -h <ip> -p <port> -u <username> -pw <password> -td <directory> [-tf <time-format> -datatype <true/false> -q <query command> -s <sql file> -linesPerFile <int>]
Description:
-datatype
:Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)
.Time, root.sg1.d1.s1 , root.sg1.d1.s2
-q <query command>
:select * from root.** limit 100
, or select * from root.** limit 100 align by device
-s <sql file>
:-td <directory>
:-tf <time-format>
:-tf timestamp
-tf yyyy-MM-dd\ HH:mm:ss
or -tf timestamp
-linesPerFile <int>
:10000
is default.-linesPerFile 1
-t <timeout>
:More, if you don't use one of -s
and -q
, you need to enter some queries after running the export script. The results of the different query will be saved to different CSV files.
# Unix/OS X > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss # or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 # Windows > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ # Or > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss # or > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" # Or > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt # Or > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt # Or > tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 # Or > tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000
select * from root.**; select * from root.** align by device;
The result of select * from root.**
Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) 1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true 1970-01-01T08:00:00.002+08:00,true,"v1",,,,true
The result of select * from root.** align by device
Time,Device,hardware(TEXT),status(BOOLEAN) 1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true 1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true 1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true 1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", 1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", 1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true 1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true
The data of boolean type signed by true
and false
without double quotes. And the text data will be enclosed in double quotes.
Note that if fields exported by the export tool have the following special characters:
,
: the field will be escaped by \
.CREATE DATABASE root.fit.d1; CREATE DATABASE root.fit.d2; CREATE DATABASE root.fit.p; CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE;
IoTDB has the ability of type inference, so it is not necessary to create metadata before data import. However, we still recommend creating metadata before importing data using the CSV import tool, as this can avoid unnecessary type conversion errors.
The data aligned by time, and headers without data type.
Time,root.test.t1.str,root.test.t2.str,root.test.t2.int 1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 1970-01-01T08:00:00.002+08:00,"123",,
The data aligned by time, and headers with data type.(Text type data supports double quotation marks and no double quotation marks)
Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) 1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 1970-01-01T08:00:00.002+08:00,123,hello world,123 1970-01-01T08:00:00.003+08:00,"123",, 1970-01-01T08:00:00.004+08:00,123,,12
The data aligned by device, and headers without data type.
Time,Device,str,int 1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", 1970-01-01T08:00:00.002+08:00,root.test.t1,"123", 1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100
The data aligned by device, and headers with data type.(Text type data supports double quotation marks and no double quotation marks)
Time,Device,str(TEXT),int(INT32) 1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", 1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 1970-01-01T08:00:00.003+08:00,root.test.t1,,123
# Unix/OS X > tools/import-csv.sh -h <ip> -p <port> -u <username> -pw <password> -f <xxx.csv> [-fd <./failedDirectory>] [-aligned <true>] [-tp <ms/ns/us>] [-typeInfer <boolean=text,float=double...>] # Windows > tools\import-csv.bat -h <ip> -p <port> -u <username> -pw <password> -f <xxx.csv> [-fd <./failedDirectory>] [-aligned <true>] [-tp <ms/ns/us>] [-typeInfer <boolean=text,float=double...>]
Description:
-f
:
-f filename.csv
-fd
:
.failed
.-fd ./failed/
-aligned
:
false
is default.-aligned true
-batch
:
org.apache.thrift.transport.TTransportException: Frame size larger than protect max size
, you can lower this parameter as appropriate.-batch 100000
, 100000
is the default value.-tp <time-precision>
:
ms
(millisecond), ns
(nanosecond), and us
(microsecond), ms
is default.-typeInfer <srcTsDataType1=dstTsDataType1,srcTsDataType2=dstTsDataType2,...>
:
srcTsDataType
includes boolean
,int
,long
,float
,double
,NaN
.dstTsDataType
includes boolean
,int
,long
,float
,double
,text
.srcTsDataType
is boolean
, dstTsDataType
should be between boolean
and text
.srcTsDataType
is NaN
, dstTsDataType
should be among float
, double
and text
.srcTsDataType
is Numeric type, dstTsDataType
precision should be greater than srcTsDataType
.-typeInfer boolean=text,float=double
-linesPerFailedFile <int>
:
10000
is default.-linesPerFailedFile 1
# Unix/OS X > tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed # or > tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed # or > tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns # or > tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double # or > tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 # Windows > tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv # or > tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed # or > tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns # or > tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double # or > tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10
Note that the following special characters in fields need to be checked before importing:
,
: fields containing ,
should be escaped by \
.yyyy-MM-dd'T'HH:mm:ss
, yyy-MM-dd HH:mm:ss
, or yyyy-MM-dd'T'HH:mm:ss.SSSZ
.Time
column must be the first one.