To build the Java version of TsFile Tools, you must have the following dependencies installed:
mvn clean package -P with-java -DskipTests
mvn install -P with-java -DskipTests
| Parameter | Description | Required | Default |
|---|---|---|---|
| table_name | Table name | Yes | |
| time_precision | Time precision (ms / us / ns / s) | No | ms |
| has_header | Whether CSV contains a header (true / false). Ignored for Parquet / Arrow. | No | true |
| separator | CSV delimiter (, / tab / ;). Ignored for Parquet / Arrow. | No | , |
| null_format | String value treated as null in CSV. Ignored for Parquet / Arrow (native null). | No | |
| tag_columns | Tag columns (device identifiers / primary key). Supports virtual columns with DEFAULT value. | No | |
| time_column | Time column name | Yes | |
| source_columns | Column definitions mapping to source file columns | Yes |
Backward compatibility:
id_columnsandcsv_columnsare still accepted as aliases fortag_columnsandsource_columns.
time column with type TIMESTAMP in TsFile.DEFAULT keyword.STRING and cannot be changed. Any type declared for a tag column in source_columns is ignored. We recommend writing tag columns in source_columns with the column name only (no type).SKIP to ignore a column.source_columns that are not time_column, not in tag_columns, and not SKIP. These are the measurement columns whose values change over time.Column name case: TsFile table-model column and table names are case-insensitive and stored as lowercase. Regardless of whether you write
Time/TIME/timeinimport.schema, the on-disk and read-back name istime.
Duplicate timestamps within the same device are not supported — rows sharing identical tag column values and the same timestamp will fail to write.
CSV file content:
Region,FactoryNumber,DeviceNumber,Model,MaintenanceCycle,Time,Temperature,Emission hebei,1001,1,10,1,1,80.0,1000.0 hebei,1001,1,10,1,4,80.0,1000.0 hebei,1002,7,5,2,1,90.0,1200.0
Schema file (import.schema):
table_name=root.db1 time_precision=ms has_header=true separator=, null_format=\N tag_columns Group DEFAULT Datang Region FactoryNumber DeviceNumber time_column=Time source_columns Region, FactoryNumber, DeviceNumber, SKIP, SKIP, Time INT64, Temperature FLOAT, Emission DOUBLE,
In this example:
Group is a virtual tag column (not in CSV) with default value DatangRegion, FactoryNumber, DeviceNumber are tag columns read from CSV; their type is fixed as STRING and need not be declaredModel and MaintenanceCycle are skipped via SKIPTemperature and Emission are automatically derived as FIELD columnsFor Parquet / Arrow in schema mode, source_columns matches by column name instead of position. Named SKIP is also supported:
source_columns Time INT64, unused_col SKIP, Temperature FLOAT, Emission DOUBLE,
Validation rules for Parquet / Arrow schema mode (enforced — mismatches raise an error and the source file is moved to --fail_dir):
source_columns must equal the number of columns in the Parquet / Arrow file. Use SKIP for any file column you don't want to import.SKIP is not allowed. Because matching is by name, an unqualified SKIP cannot identify a column. Always use columnName SKIP.| Parameter | Description | Required | Default |
|---|---|---|---|
| -s, --source | Input file or directory | Yes | |
| -t, --target | Output directory | Yes | |
| --schema | Schema file path. Omit for auto mode. | No | |
| --fail_dir | Directory for failed source files | No | failed |
| --format | Source format: csv / parquet / arrow. Auto-detected by file extension if omitted. | No | auto-detect |
| --table_name | Table name override (auto mode) | No | derived from filename |
| --time_precision | Time precision override (auto mode): ms / us / ns / s | No | ms |
| --separator | CSV delimiter (auto mode): , / tab / ; | No | , |
| -b, --block_size | CSV chunk size (e.g. 256M, 1G) | No | 256M |
| -tn, --thread_num | Thread count for parallel processing | No | 8 |
Provide a --schema file to explicitly define column mapping, types, tags, and time column.
# CSV csv2tsfile.sh --source ./data/csv --target ./output --fail_dir ./failed --schema ./schema/import.schema csv2tsfile.bat --source .\data\csv --target .\output --fail_dir .\failed --schema .\schema\import.schema # Parquet parquet2tsfile.sh --source ./data/parquet --target ./output --fail_dir ./failed --schema ./schema/import.schema parquet2tsfile.bat --source .\data\parquet --target .\output --fail_dir .\failed --schema .\schema\import.schema # Arrow arrow2tsfile.sh --source ./data/arrow --target ./output --fail_dir ./failed --schema ./schema/import.schema arrow2tsfile.bat --source .\data\arrow --target .\output --fail_dir .\failed --schema .\schema\import.schema
Omit --schema to automatically infer column types and detect the time column.
Auto mode rules:
time or TIME (case-sensitive, strict match).time / TIME is selected as the time axis. The remaining Timestamp columns become FIELD columns and are stored as INT64 (raw value preserved, TIMESTAMP semantic dropped). To keep them as TIMESTAMP, switch to schema mode and declare them explicitly.sensor.csv → table sensor). Sanitization rules applied in order:.csv / .parquet / .arrow / .ipc / .feather, or the last .-suffix as a fallback).a–z, A–Z), digits (0–9), underscore (_), and dot (.). Every other character is replaced with _._ into a single _; strip leading and trailing _.csv_data / parquet_data / arrow_data.t_ (TsFile table names cannot start with a digit).\NAuto mode example:
CSV file (sensor.csv):
time,temperature,humidity,status 1000,25.5,60.0,true 2000,26.1,55.3,false 3000,27.0,58.1,true
Auto mode infers:
table name: sensor (from filename) time column: time fields: temperature DOUBLE, humidity DOUBLE, status BOOLEAN tags: (none)
Commands:
# CSV csv2tsfile.sh --source ./data/csv --target ./output --fail_dir ./failed csv2tsfile.bat --source .\data\csv --target .\output --fail_dir .\failed # CSV with options csv2tsfile.sh --source ./data/csv --target ./output --table_name my_table --separator tab --time_precision us # Parquet parquet2tsfile.sh --source ./data/parquet --target ./output --fail_dir ./failed parquet2tsfile.bat --source .\data\parquet --target .\output --fail_dir .\failed # Arrow (.arrow / .ipc / .feather) arrow2tsfile.sh --source ./data/arrow --target ./output --fail_dir ./failed arrow2tsfile.bat --source .\data\arrow --target .\output --fail_dir .\failed
{source_basename}.tsfile{source_basename}_1.tsfile, {source_basename}_2.tsfile, ...--table_name, filename comes from source file.