This document explains how to load CSV format data files into Doris. Doris supports flexible CSV format configuration, including custom delimiters, field enclosures, and provides various loading methods to meet data loading requirements in different scenarios.
Doris supports the following methods to load CSV format data:
The following table lists the CSV format parameters supported by various loading methods:
| Parameter | Default Value | Stream Load | Broker Load | Routine Load | MySQL Load | TVF |
|---|---|---|---|---|---|---|
| Line Delimiter | \n | line_delimiter | LINES TERMINATED BY | Not supported | LINES TERMINATED BY | line_delimiter |
| Column Delimiter | \t | column_separator | COLUMNS TERMINATED BY | COLUMNS TERMINATED BY | COLUMNS TERMINATED BY | column_separator |
| Enclosure | None | enclose | PROPERTIES.enclose | PROPERTIES.enclose | PROPERTIES.enclose | enclose |
| Escape Character | \ | escape | PROPERTIES.escape | PROPERTIES.escape | PROPERTIES.escape | escape |
| Skip Lines | 0 | skip_lines | PROPERTIES.skip_lines | Not supported | IGNORE LINES | skip_lines |
| Trim Double Quotes | false | trim_double_quotes | Not supported | PROPERTIES.trim_double_quotes | Not supported | trim_double_quotes |
| Compression Format | plain | compress_type | PROPERTIES.compress_type | Not supported | Not supported | compress_type |
:::tip Note
-H "line_delimiter:\n"COLUMNS TERMINATED BY, LINES TERMINATED BYPROPERTIES, e.g., PROPERTIES("compress_type"="gz")COLUMNS TERMINATED BYPROPERTIES, e.g., PROPERTIES("enclose"="\"")LINES TERMINATED BY, COLUMNS TERMINATED BYPROPERTIES, e.g., PROPERTIES("escape"="\\")S3("line_delimiter"="\n") :::\nData File: John,25\n Mary,30\n Parameter Setting: line_delimiter: \n (default value, can be omitted)
Data File: John,25\r\n Mary,30\r\n Parameter Setting: line_delimiter: \r\n
Data File: John,25\r Mary,30\r Parameter Setting: line_delimiter: \r
Data File: John,25|| Mary,30|| Parameter Setting: line_delimiter: ||
\t (tab)\x prefix\x01 needs to be written as \\x01 in Broker LoadData File: John,25,New York Mary,30,Los Angeles Parameter Setting: column_separator: ,
Data File: John 25 New York Mary 30 Los Angeles Parameter Setting: column_separator: \t (default value, can be omitted)
Data File: John\x0125\x01New York Mary\x0130\x01Los Angeles Parameter Setting: column_separator: \x01
Data File:
John\x0125\x01New York
Mary\x0130\x01Los Angeles
Parameter Setting:
PROPERTIES("column_separator"="\\x01")
Data File: John||25||New York Mary||30||Los Angeles Parameter Setting: column_separator: ||
'"Data: a,'b,c',d Column Delimiter: , Enclosure: ' Parsing Result: 3 fields [a] [b,c] [d]
Data: a,'b\nc',d Column Delimiter: , Enclosure: ' Parsing Result: 3 fields [a] [b\nc] [d]
Data: a,'b,c\nd,e',f Column Delimiter: , Enclosure: ' Parsing Result: 3 fields [a] [b,c\nd,e] [f]
\Data: a,'b,\'c',d Column Delimiter: , Enclosure: ' Escape Character: \ Parsing Result: 3 fields [a] [b,'c] [d]
Data: a,"b,\"c\"d",e Column Delimiter: , Enclosure: " Escape Character: \ Parsing Result: 3 fields [a] [b,"c"d] [e]
Data: a,'b\\c',d Column Delimiter: , Enclosure: ' Escape Character: \ Parsing Result: 3 fields [a] [b\c] [d]
csv_with_names, the system automatically skips the first line (column names), ignoring the skip_lines parametercsv_with_names_and_types, the system automatically skips the first two lines (column names and types), ignoring the skip_lines parameterData File: Name,Age,City John,25,New York Mary,30,Los Angeles Parameter Setting: skip_lines: 1 Result: Skip title line, load subsequent data
Data File: # User Information Table # Created Time: 2024-01-01 John,25,New York Mary,30,Los Angeles Parameter Setting: skip_lines: 2 Result: Skip comment lines, load subsequent data
Data File: name,age,city John,25,New York Mary,30,Los Angeles Parameter Setting: format: csv_with_names Result: System automatically skips the first line of column names
Data File: name,age,city string,int,string John,25,New York Mary,30,Los Angeles Parameter Setting: format: csv_with_names_and_types Result: System automatically skips the first two lines of column names and types
Data File: "John","25","New York" "Mary","30","Los Angeles" Parameter Setting: trim_double_quotes: true Result: John,25,New York Mary,30,Los Angeles
This section demonstrates the usage of CSV format in different loading methods.
# Specify delimiter curl --location-trusted -u root: \ -H "column_separator:," \ -H "line_delimiter:\n" \ -T example.csv \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load # Handle quoted data curl --location-trusted -u root: \ -H "column_separator:," \ -H "enclose:\"" \ -T example.csv \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load # Load compressed file curl --location-trusted -u root: \ -H "compress_type:gz" \ -T example.csv.gz \ http://<fe_host>:<fe_http_port>/api/test_db/test_table/_stream_load
-- Specify delimiter LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv") INTO TABLE test_table COLUMNS TERMINATED BY "," LINES TERMINATED BY "\n" ... ); -- Handle quoted data LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv") INTO TABLE test_table COLUMNS TERMINATED BY "," LINES TERMINATED BY "\n" PROPERTIES ( "enclose" = "\"", "escape" = "\\" ) ... ); -- Load compressed file LOAD LABEL test_db.test_label ( DATA INFILE("s3://bucket/example.csv.gz") INTO TABLE test_table COLUMNS TERMINATED BY "," LINES TERMINATED BY "\n" PROPERTIES ( "compress_type" = "gz" ) ... );
-- Specify delimiter CREATE ROUTINE LOAD test_db.test_job ON test_table COLUMNS TERMINATED BY "," FROM KAFKA ( ... ); -- Handle quoted data CREATE ROUTINE LOAD test_db.test_job ON test_table COLUMNS TERMINATED BY "," PROPERTIES ( "enclose" = "\"", "escape" = "\\" ) FROM KAFKA ( ... );
-- Specify delimiter LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n'; -- Handle quoted data LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' PROPERTIES ( "enclose" = "\"", "escape" = "\\" ); -- Skip table header LOAD DATA LOCAL INFILE 'example.csv' INTO TABLE test_table COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;
-- Specify delimiter INSERT INTO test_table SELECT * FROM S3 ( "path" = "s3://bucket/example.csv", "column_separator" = ",", "line_delimiter" = "\n", ... ); -- Handle quoted data INSERT INTO test_table SELECT * FROM S3 ( "path" = "s3://bucket/example.csv", "column_separator" = ",", "enclose" = "\"", "escape" = "\\", ... ); -- Load compressed file INSERT INTO test_table SELECT * FROM S3 ( "path" = "s3://bucket/example.csv.gz", "column_separator" = ",", "line_delimiter" = "\n", "compress_type" = "gz", ... );