{ “title”: “Importing Data from MinIO”, “language”: “en”, “description”: “How to import CSV, Parquet, and other files from MinIO object storage into Doris? Both asynchronous S3 Load and synchronous TVF methods are supported.”, “keywords”: [ “MinIO import”, “Doris MinIO”, “S3 Load”, “S3 TVF”, “object storage import”, “use_path_style”, “MinIO endpoint” ], “sidebar_label”: “MinIO” }

MinIO is an S3-compatible object storage. Doris provides two methods for importing files from MinIO. Choose between them based on data volume and timeliness requirements:

Import methodExecution modeApplicable scenarioDocumentation reference
S3 LoadAsynchronousLarge-batch data import; tasks that need to run in the backgroundBroker Load Manual
TVF (Table Value Function)SynchronousSmall-batch, ad-hoc query imports; works with INSERT INTO ... SELECTExamples in this document

Prerequisites

Before importing MinIO data using either method, confirm the following conditions:

  • A Doris cluster is deployed and can access the MinIO service normally.
  • You have obtained the MinIO endpoint, region, access key, and secret key.
  • The CSV/Parquet files to be imported have been uploaded to a MinIO bucket.

:::caution Important: MinIO Connection Configuration Notes When using S3 Load or TVF to import MinIO data, note the following two points:

  • Endpoint protocol prefix: If MinIO is deployed on a local network without TLS enabled, you need to explicitly add http:// to the endpoint, for example "s3.endpoint" = "http://localhost:9000".
  • Path access style: The S3 SDK uses virtual-hosted style by default, but MinIO does not enable this access mode by default. Add "use_path_style" = "true" to force path style. :::

Method 1: Import Using S3 Load (Asynchronous)

S3 Load is suitable for importing files from MinIO into Doris as an asynchronous task. For detailed steps, refer to the Broker Load Manual.

Step 1: Prepare the Data

Create a CSV file s3load_example.csv and upload it to MinIO with the following content:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a Table in Doris

CREATE TABLE test_s3load(
    user_id            BIGINT       NOT NULL COMMENT "user id",
    name               VARCHAR(20)           COMMENT "name",
    age                INT                   COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Import Data Using S3 Load

Execute the following SQL to submit an S3 Load task:

LOAD LABEL s3_load_2022_04_01
(
    DATA INFILE("s3://your_bucket_name/s3load_example.csv")
    INTO TABLE test_s3load
    COLUMNS TERMINATED BY ","
    FORMAT AS "CSV"
    (user_id, name, age)
)
WITH S3
(
    "provider" = "S3",
    "s3.endpoint" = "play.min.io:9000",
    "s3.region" = "us-east-1",
    "s3.access_key" = "myminioadmin",
    "s3.secret_key" = "minio-secret-key-change-me",
    "use_path_style" = "true"
)
PROPERTIES
(
    "timeout" = "3600"
);

Step 4: Verify the Imported Data

Run a query to verify whether the data has been imported successfully:

SELECT * FROM test_s3load;

Expected output:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name      | age  |
+---------+-----------+------+
|       5 | Ava       |   17 |
|      10 | Liam      |   64 |
|       7 | Sophia    |   32 |
|       9 | Emma      |   37 |
|       1 | Emily     |   25 |
|       4 | Alexander |   60 |
|       2 | Benjamin  |   35 |
|       3 | Olivia    |   28 |
|       6 | William   |   69 |
|       8 | James     |   64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Method 2: Import Using TVF (Synchronous)

The TVF (Table Value Function) method reads MinIO files as a virtual table through the S3() function, and combined with INSERT INTO ... SELECT it completes the import synchronously. It is suitable for small-batch or ad-hoc scenarios.

Step 1: Prepare the Data

Create a CSV file s3load_example.csv and upload it to MinIO with the following content:

1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64

Step 2: Create a Table in Doris

CREATE TABLE test_s3load(
    user_id            BIGINT       NOT NULL COMMENT "user id",
    name               VARCHAR(20)           COMMENT "name",
    age                INT                   COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;

Step 3: Import Data Using TVF

Execute the following SQL to import the data synchronously:

INSERT INTO test_s3load
SELECT * FROM S3
(
    "uri" = "s3://your_bucket_name/s3load_example.csv",
    "format" = "csv",
    "provider" = "S3",
    "s3.endpoint" = "play.min.io:9000",
    "s3.region" = "us-east-1",
    "s3.access_key" = "myminioadmin",
    "s3.secret_key" = "minio-secret-key-change-me",
    "column_separator" = ",",
    "csv_schema" = "user_id:int;name:string;age:int",
    "use_path_style" = "true"
);

Step 4: Verify the Imported Data

Run a query to verify whether the data has been imported successfully:

SELECT * FROM test_s3load;

Expected output:

mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name      | age  |
+---------+-----------+------+
|       5 | Ava       |   17 |
|      10 | Liam      |   64 |
|       7 | Sophia    |   32 |
|       9 | Emma      |   37 |
|       1 | Emily     |   25 |
|       4 | Alexander |   60 |
|       2 | Benjamin  |   35 |
|       3 | Olivia    |   28 |
|       6 | William   |   69 |
|       8 | James     |   64 |
+---------+-----------+------+
10 rows in set (0.04 sec)

Key Parameters

The following parameters must be configured correctly for both S3 Load and TVF:

ParameterDescriptionExample value
providerObject storage provider. Set to S3 when using MinIO.S3
s3.endpointMinIO service address. The http:// prefix is required when TLS is not enabled.http://localhost:9000
s3.regionThe region where MinIO is deployed. Can be set to any value but must remain consistent.us-east-1
s3.access_keyMinIO access key ID.myminioadmin
s3.secret_keyMinIO access key secret.minio-secret-key-change-me
use_path_styleWhether to use path-style access. Must be set to true for MinIO.true

FAQ

Q1: How do I choose between S3 Load and TVF?

  • S3 Load: Executes asynchronously. Suitable for large-batch data imports. After submission, Doris schedules and runs the task in the background, and you can query the task status with SHOW LOAD.
  • TVF: Executes synchronously. Suitable for small-batch, ad-hoc analysis, or scenarios combined with INSERT INTO ... SELECT pipelines. Returns results immediately.

Q2: What should I do if I get endpoint-related errors when connecting to MinIO?

Confirm whether the endpoint has the correct protocol prefix:

  • TLS not enabled: Must include http://, such as http://localhost:9000.
  • TLS enabled: Use the https:// prefix.

Q3: What should I do if access reports a bucket parsing error or 404?

MinIO does not support virtual-hosted style access by default. You need to explicitly add the following to the import parameters:

"use_path_style" = "true"

Q4: Are other formats such as Parquet/ORC supported?

Yes. Replace FORMAT AS "CSV" (or "format" = "csv" in TVF) with parquet, orc, or other corresponding formats. For details, see the Broker Load Manual.