This document explains how to import data from local files into Apache Doris. Doris provides three local file import methods, and you can choose the appropriate one based on data size, number of files, and client environment.
The following table compares the characteristics and applicable scenarios of the three local import methods to help you make a quick choice:
| Import Method | Protocol/Underlying | Sync/Async | Supported Formats | Typical Scenarios | Reference |
|---|---|---|---|---|---|
| Stream Load | HTTP | Synchronous | CSV, JSON, Parquet, ORC | Single file, scripted import | Stream Load |
| Streamloader | Built on Stream Load | Synchronous | CSV, JSON, Parquet, ORC | Multiple files, concurrent import of large data volumes | Streamloader |
| MySQL Load | MySQL protocol | Synchronous | CSV | Local CSV files imported through a MySQL client | MySQL Load |
Brief description of each method:
Create a CSV file streamload_example.csv with the following content:
1,Emily,25 2,Benjamin,35 3,Olivia,28 4,Alexander,60 5,Ava,17 6,William,69 7,Sophia,32 8,James,64 9,Emma,37 10,Liam,64
Create the target table in Doris using the following syntax:
CREATE TABLE testdb.test_streamload( user_id BIGINT NOT NULL COMMENT "User ID", name VARCHAR(20) COMMENT "User name", age INT COMMENT "User age" ) DUPLICATE KEY(user_id) DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Submit a Stream Load import job using curl:
curl --location-trusted -u <doris_user>:<doris_password> \ -H "column_separator:," \ -H "columns:user_id,name,age" \ -T streamload_example.csv \ -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
Stream Load is a synchronous import method, and the import result is returned directly to the user:
{ "TxnId": 3, "Label": "123", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 10, "NumberLoadedRows": 10, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 118, "LoadTimeMs": 173, "BeginTxnTimeMs": 1, "StreamLoadPutTimeMs": 70, "ReadDataTimeMs": 2, "WriteDataTimeMs": 48, "CommitAndPublishTimeMs": 52 }
select count(*) from testdb.test_streamload; +----------+ | count(*) | +----------+ | 10 | +----------+
Create a CSV file streamloader_example.csv with the following content:
1,Emily,25 2,Benjamin,35 3,Olivia,28 4,Alexander,60 5,Ava,17 6,William,69 7,Sophia,32 8,James,64 9,Emma,37 10,Liam,64
Create the target table in Doris using the following syntax:
CREATE TABLE testdb.test_streamloader( user_id BIGINT NOT NULL COMMENT "User ID", name VARCHAR(20) COMMENT "User name", age INT COMMENT "User age" ) DUPLICATE KEY(user_id) DISTRIBUTED BY HASH(user_id) BUCKETS 10;
doris-streamloader --source_file="streamloader_example.csv" --url="http://localhost:8330" --header="column_separator:," --db="testdb" --table="test_streamloader"
This is a synchronous import method, and the import result is returned directly to the user:
Load Result: { "Status": "Success", "TotalRows": 10, "FailLoadRows": 0, "LoadedRows": 10, "FilteredRows": 0, "UnselectedRows": 0, "LoadBytes": 118, "LoadTimeMs": 623, "LoadFiles": [ "streamloader_example.csv" ] }
select count(*) from testdb.test_streamloader; +----------+ | count(*) | +----------+ | 10 | +----------+
Create a file named client_local.csv with the following sample data:
1,10 2,20 3,30 4,40 5,50 6,60
Before running the LOAD DATA command, connect to the MySQL client first:
mysql --local-infile -h <fe_ip> -P <fe_query_port> -u root -D testdb
To run MySQL Load correctly, you must use the following options when connecting:
--local-infile option, otherwise an error may occur.allowLoadLocalInfile=true in the URL.Then create the target table in Doris:
CREATE TABLE testdb.t1 ( pk INT, v1 INT SUM ) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk);
After connecting to the MySQL client, create the import job with the following command:
LOAD DATA LOCAL INFILE 'client_local.csv' INTO TABLE testdb.t1 COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n';
MySQL Load is a synchronous import method. The result is returned to the user on the command line after the import. If the import fails, the specific error message is displayed.
The following is the result of a successful import, which returns the number of imported rows:
Query OK, 6 rows affected (0.17 sec) Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
Q1: How do I choose among the three import methods?
Q2: When running MySQL Load, the error The used command is not allowed with this MySQL version is reported.
Confirm the following two points:
--local-infile option is added when connecting through the MySQL command line.allowLoadLocalInfile=true is added to the URL when connecting through JDBC.Q3: Which data formats does Stream Load support?
It supports four formats: CSV, JSON, Parquet, and ORC. You can specify the format parameter through an HTTP Header.
Q4: How do I tell whether a Stream Load import succeeded?
Check the Status field in the returned result: a value of Success indicates success. You can also check the NumberLoadedRows and NumberFilteredRows fields to confirm the number of imported and filtered rows.