MinIO is an S3-compatible object storage. Doris provides two methods for importing files from MinIO. Choose between them based on data volume and timeliness requirements:
| Import method | Execution mode | Applicable scenario | Documentation reference |
|---|---|---|---|
| S3 Load | Asynchronous | Large-batch data import; tasks that need to run in the background | Broker Load Manual |
| TVF (Table Value Function) | Synchronous | Small-batch, ad-hoc query imports; works with INSERT INTO ... SELECT | Examples in this document |
Before importing MinIO data using either method, confirm the following conditions:
:::caution Important: MinIO Connection Configuration Notes When using S3 Load or TVF to import MinIO data, note the following two points:
http:// to the endpoint, for example "s3.endpoint" = "http://localhost:9000"."use_path_style" = "true" to force path style. :::S3 Load is suitable for importing files from MinIO into Doris as an asynchronous task. For detailed steps, refer to the Broker Load Manual.
Create a CSV file s3load_example.csv and upload it to MinIO with the following content:
1,Emily,25 2,Benjamin,35 3,Olivia,28 4,Alexander,60 5,Ava,17 6,William,69 7,Sophia,32 8,James,64 9,Emma,37 10,Liam,64
CREATE TABLE test_s3load( user_id BIGINT NOT NULL COMMENT "user id", name VARCHAR(20) COMMENT "name", age INT COMMENT "age" ) DUPLICATE KEY(user_id) DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Execute the following SQL to submit an S3 Load task:
LOAD LABEL s3_load_2022_04_01 ( DATA INFILE("s3://your_bucket_name/s3load_example.csv") INTO TABLE test_s3load COLUMNS TERMINATED BY "," FORMAT AS "CSV" (user_id, name, age) ) WITH S3 ( "provider" = "S3", "s3.endpoint" = "play.min.io:9000", "s3.region" = "us-east-1", "s3.access_key" = "myminioadmin", "s3.secret_key" = "minio-secret-key-change-me", "use_path_style" = "true" ) PROPERTIES ( "timeout" = "3600" );
Run a query to verify whether the data has been imported successfully:
SELECT * FROM test_s3load;
Expected output:
mysql> select * from test_s3load; +---------+-----------+------+ | user_id | name | age | +---------+-----------+------+ | 5 | Ava | 17 | | 10 | Liam | 64 | | 7 | Sophia | 32 | | 9 | Emma | 37 | | 1 | Emily | 25 | | 4 | Alexander | 60 | | 2 | Benjamin | 35 | | 3 | Olivia | 28 | | 6 | William | 69 | | 8 | James | 64 | +---------+-----------+------+ 10 rows in set (0.04 sec)
The TVF (Table Value Function) method reads MinIO files as a virtual table through the S3() function, and combined with INSERT INTO ... SELECT it completes the import synchronously. It is suitable for small-batch or ad-hoc scenarios.
Create a CSV file s3load_example.csv and upload it to MinIO with the following content:
1,Emily,25 2,Benjamin,35 3,Olivia,28 4,Alexander,60 5,Ava,17 6,William,69 7,Sophia,32 8,James,64 9,Emma,37 10,Liam,64
CREATE TABLE test_s3load( user_id BIGINT NOT NULL COMMENT "user id", name VARCHAR(20) COMMENT "name", age INT COMMENT "age" ) DUPLICATE KEY(user_id) DISTRIBUTED BY HASH(user_id) BUCKETS 10;
Execute the following SQL to import the data synchronously:
INSERT INTO test_s3load SELECT * FROM S3 ( "uri" = "s3://your_bucket_name/s3load_example.csv", "format" = "csv", "provider" = "S3", "s3.endpoint" = "play.min.io:9000", "s3.region" = "us-east-1", "s3.access_key" = "myminioadmin", "s3.secret_key" = "minio-secret-key-change-me", "column_separator" = ",", "csv_schema" = "user_id:int;name:string;age:int", "use_path_style" = "true" );
Run a query to verify whether the data has been imported successfully:
SELECT * FROM test_s3load;
Expected output:
mysql> select * from test_s3load; +---------+-----------+------+ | user_id | name | age | +---------+-----------+------+ | 5 | Ava | 17 | | 10 | Liam | 64 | | 7 | Sophia | 32 | | 9 | Emma | 37 | | 1 | Emily | 25 | | 4 | Alexander | 60 | | 2 | Benjamin | 35 | | 3 | Olivia | 28 | | 6 | William | 69 | | 8 | James | 64 | +---------+-----------+------+ 10 rows in set (0.04 sec)
The following parameters must be configured correctly for both S3 Load and TVF:
| Parameter | Description | Example value |
|---|---|---|
provider | Object storage provider. Set to S3 when using MinIO. | S3 |
s3.endpoint | MinIO service address. The http:// prefix is required when TLS is not enabled. | http://localhost:9000 |
s3.region | The region where MinIO is deployed. Can be set to any value but must remain consistent. | us-east-1 |
s3.access_key | MinIO access key ID. | myminioadmin |
s3.secret_key | MinIO access key secret. | minio-secret-key-change-me |
use_path_style | Whether to use path-style access. Must be set to true for MinIO. | true |
SHOW LOAD.INSERT INTO ... SELECT pipelines. Returns results immediately.Confirm whether the endpoint has the correct protocol prefix:
http://, such as http://localhost:9000.https:// prefix.MinIO does not support virtual-hosted style access by default. You need to explicitly add the following to the import parameters:
"use_path_style" = "true"
Yes. Replace FORMAT AS "CSV" (or "format" = "csv" in TVF) with parquet, orc, or other corresponding formats. For details, see the Broker Load Manual.