This document describes how to import CSV-format data files into Apache Doris. Doris supports flexible CSV format configuration, including custom row/column delimiters, field enclosing characters, escape characters, the number of lines to skip, and compression formats. It also provides multiple import methods to meet data import requirements for different scenarios such as batch loading, real-time streaming ingestion, and federated queries.
Before you start, choose an appropriate import method based on your data source and timeliness requirements, then refer to the parameters and examples in the corresponding section:
Doris provides the following methods for importing CSV-format data:
| Import method | Applicable scenario | Entry point |
|---|---|---|
| Stream Load | Push local files or program data via HTTP | Stream Load |
| Broker Load | Batch import from remote storage such as S3 / HDFS | Broker Load |
| Routine Load | Continuously subscribe to and import from Kafka | Routine Load |
| MySQL Load | Import local files via the MySQL protocol | MySQL Load |
| INSERT INTO FROM S3 TVF | Read S3 files directly and insert into a table | S3 TVF |
| INSERT INTO FROM HDFS TVF | Read HDFS files directly and insert into a table | HDFS TVF |
The following table summarizes the support and corresponding syntax of CSV format parameters across different import methods:
| Parameter | Default | Stream Load | Broker Load | Routine Load | MySQL Load | TVF |
|---|---|---|---|---|---|---|
| Line delimiter | \n | line_delimiter | LINES TERMINATED BY | Not supported | LINES TERMINATED BY | line_delimiter |
| Column separator | \t | column_separator | COLUMNS TERMINATED BY | COLUMNS TERMINATED BY | COLUMNS TERMINATED BY | column_separator |
| Enclose | None | enclose | PROPERTIES.enclose | PROPERTIES.enclose | PROPERTIES.enclose | enclose |
| Escape | \ | escape | PROPERTIES.escape | PROPERTIES.escape | PROPERTIES.escape | escape |
| Skip lines | 0 | skip_lines | PROPERTIES.skip_lines | Not supported | IGNORE LINES | skip_lines |
| Trim double quotes | false | trim_double_quotes | Not supported | PROPERTIES.trim_double_quotes | Not supported | trim_double_quotes |
| Compression format | plain | compress_type | PROPERTIES.compress_type | Not supported | Not supported | compress_type |
:::tip Parameter syntax for each import method
-H "line_delimiter:\n"COLUMNS TERMINATED BY and LINES TERMINATED BYPROPERTIES, for example: PROPERTIES("compress_type"="gz")COLUMNS TERMINATED BYPROPERTIES, for example: PROPERTIES("enclose"="\"")LINES TERMINATED BY and COLUMNS TERMINATED BYPROPERTIES, for example: PROPERTIES("escape"="\\")S3("line_delimiter"="\n") :::\nTypical scenarios:
Linux/Unix files:
Data file: Zhang San, 25\n Li Si, 30\n Parameter setting: line_delimiter: \n (default value, can be omitted)
Windows files:
Data file: Zhang San, 25\r\n Li Si, 30\r\n Parameter setting: line_delimiter: \r\n
Files generated by special programs:
Data file: Zhang San, 25\r Li Si, 30\r Parameter setting: line_delimiter: \r
Custom multi-character delimiter:
Data file: Zhang San, 25|| Li Si, 30|| Parameter setting: line_delimiter: ||
\t (tab)\x prefix\x01 must be written as \\x01 in Broker LoadTypical scenarios:
Common visible characters:
Data file: Zhang San, 25, Beijing Li Si, 30, Shanghai Parameter setting: column_separator: ,
Tab (default):
Data file: Zhang San 25 Beijing Li Si 30 Shanghai Parameter setting: column_separator: \t (default value, can be omitted)
Hive files (Stream Load):
Data file: Zhang San\x0125\x01 Beijing Li Si\x0130\x01 Shanghai Parameter setting: column_separator: \x01
Hive files (Broker Load):
Data file: Zhang San\x0125\x01 Beijing Li Si\x0130\x01 Shanghai Parameter setting: PROPERTIES("column_separator"="\\x01")
Multi-character delimiter:
Data file: Zhang San||25||Beijing Li Si||30||Shanghai Parameter setting: column_separator: ||
'"Typical scenarios:
Field contains the column separator:
Data: a,'b,c',d Column separator: , Enclose: ' Parsing result: 3 fields [a] [b,c] [d]
Field contains the line delimiter:
Data: a,'b\nc',d Column separator: , Enclose: ' Parsing result: 3 fields [a] [b\nc] [d]
Field contains both the column separator and the line delimiter:
Data: a,'b,c\nd,e',f Column separator: , Enclose: ' Parsing result: 3 fields [a] [b,c\nd,e] [f]
\Typical scenarios:
Field contains the enclosing character:
Data: a,'b,\'c',d Column separator: , Enclose: ' Escape: \ Parsing result: 3 fields [a] [b,'c] [d]
Field contains multiple enclosing characters:
Data: a,"b,\"c\"d",e Column separator: , Enclose: " Escape: \ Parsing result: 3 fields [a] [b,"c"d] [e]
Field contains the escape character itself:
Data: a,'b\\c',d Column separator: , Enclose: ' Escape: \ Parsing result: 3 fields [a] [b\c] [d]
format is csv_with_names, the system automatically skips the first line (column names) and ignores the skip_lines parameterformat is csv_with_names_and_types, the system automatically skips the first two lines (column names and types) and ignores the skip_lines parameterTypical scenarios:
Skip the header line:
Data file: Name, Age, City Zhang San, 25, Beijing Li Si, 30, Shanghai Parameter setting: skip_lines: 1 Result: skip the header line and import the subsequent data
Skip comment lines:
Data file: # User information table # Created at: 2024-01-01 Zhang San, 25, Beijing Li Si, 30, Shanghai Parameter setting: skip_lines: 2 Result: skip the comment lines and import the subsequent data
Use the csv_with_names format:
Data file: name,age,city Zhang San, 25, Beijing Li Si, 30, Shanghai Parameter setting: format: csv_with_names Result: the system automatically skips the first line of column names
Use the csv_with_names_and_types format:
Data file: name,age,city string,int,string Zhang San, 25, Beijing Li Si, 30, Shanghai Parameter setting: format: csv_with_names_and_types Result: the system automatically skips the first two lines of column names and types
Typical scenarios:
Trim double quotes:
Data file: "Zhang San","25","Beijing" "Li Si","30","Shanghai" Parameter setting: trim_double_quotes: true Result: Zhang San, 25, Beijing Li Si, 30, Shanghai
Purpose: specifies the compression format of the file to import
Type: string, case-insensitive
Default: plain
Supported compression formats:
| Value | Description |
|---|---|
| plain | No compression (default) |
| bz2 | BZIP2 compression |
| deflate | DEFLATE compression |
| gz | GZIP compression |
| lz4 | LZ4 Frame format compression |
| lz4_block | LZ4 Block format compression |
| lzo | LZO compression |
| lzop | LZOP compression |
| snappy_block | SNAPPY Block format compression |
Notes:
.tar files are not supportedThis section shows the three most common operations for each import method: specifying delimiters, handling quoted data, and importing compressed files.
# Specify delimiters curl --location-trusted -u root: \ -H "column_separator:," \ -H "line_delimiter:\n" \ -T example.csv \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load # Handle quoted data curl --location-trusted -u root: \ -H "column_separator:," \ -H "enclose:\"" \ -H "escape:\\" \ -T example.csv \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load # Import a compressed file curl --location-trusted -u root: \ -H "compress_type:gz" \ -T example.csv.gz \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load
-- Specify delimiters LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv") INTO TABLE test_table COLUMNS TERMINATED BY "," LINES TERMINATED BY "\n" ) WITH S3 ( ... ); -- Handle quoted data LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv") INTO TABLE test_table PROPERTIES ( "enclose" = "\"", "escape" = "\\" ) ) WITH S3 ( ... ); -- Import a compressed file LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv.gz") INTO TABLE test_table PROPERTIES ( "compress_type" = "gz" ) ) WITH S3 ( ... );
-- Specify delimiters CREATE ROUTINE LOAD test_db.test_job ON test_table COLUMNS TERMINATED BY "," FROM KAFKA ( ... ); -- Handle quoted data CREATE ROUTINE LOAD test_db.test_job ON test_table COLUMNS TERMINATED BY "," PROPERTIES ( "enclose" = "\"", "escape" = "\\" ) FROM KAFKA ( ... );
-- Specify delimiters LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n'; -- Handle quoted data LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' PROPERTIES ( "enclose" = "\"", "escape" = "\\" ); -- Skip the header LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;
-- Specify delimiters INSERT INTO test_table SELECT * FROM S3 ( "uri" = "s3://bucket/example.csv", "format" = "csv", "column_separator" = ",", "line_delimiter" = "\n" ... ); -- Handle quoted data INSERT INTO test_table SELECT * FROM S3 ( "uri" = "s3://bucket/example.csv", "format" = "csv", "column_separator" = ",", "enclose" = "\"", "escape" = "\\" ... ); -- Import a compressed file INSERT INTO test_table SELECT * FROM S3 ( "uri" = "s3://bucket/example.csv.gz", "format" = "csv", "compress_type" = "gz" ... );
Q1: Hive's default \x01 column separator does not work in Broker Load. Why?
Under the MySQL protocol, invisible characters require an additional backslash and must be written as \\x01, for example PROPERTIES("column_separator"="\\x01"). When specified via the HTTP header in Stream Load, \x01 is sufficient.
Q2: Does Routine Load support a custom line delimiter?
No. Each message consumed by Routine Load from Kafka naturally corresponds to a single line, so the line_delimiter parameter is neither needed nor supported.
Q3: The first line of the CSV file is the header. How do I skip it during import?
skip_lines=1IGNORE 1 LINESname1,name2,... header convention, set format to csv_with_names and the system automatically skips the first line. Set it to csv_with_names_and_types to automatically skip the first two lines (column names + types)Q4: Can I import a .tar or .tar.gz file directly?
No. tar is a packaging format, not a compression format. Doris only supports the compression formats listed under compress_type. Unpack .tar or .tar.gz files first and then import them.
Q5: A field contains both the column separator and a line break. How should I handle it?
Set enclose for that field. For example, with single quotes as the enclosing character, a,'b,c\nd,e',f is correctly parsed into 3 fields [a], [b,c\nd,e], [f] when the column separator is , and the enclose is '. If the field also contains the enclosing character itself, combine it with escape.
Q6: How do I remove the outermost double quotes when importing fields wrapped in double quotes?
Set trim_double_quotes to true, and Doris automatically trims the outermost double quotes from each field after parsing. This parameter is supported in Stream Load, Routine Load, and TVF, but not in Broker Load or MySQL Load.