blob: 6bb3a665c5cfe3755c859824cf7d7a8723ae788e [file] [view]
---
{
"title": "Importing Data from MinIO",
"language": "en",
"description": "How to import CSV, Parquet, and other files from MinIO object storage into Doris? Both asynchronous S3 Load and synchronous TVF methods are supported.",
"keywords": [
"MinIO import",
"Doris MinIO",
"S3 Load",
"S3 TVF",
"object storage import",
"use_path_style",
"MinIO endpoint"
],
"sidebar_label": "MinIO"
}
---
<!-- Knowledge type: Procedure -->
<!-- Applicable scenario: Importing files from MinIO object storage into Doris -->
[MinIO](https://min.io/) is an S3-compatible object storage. Doris provides two methods for importing files from MinIO. Choose between them based on data volume and timeliness requirements:
| Import method | Execution mode | Applicable scenario | Documentation reference |
|----------|----------|----------|----------|
| S3 Load | Asynchronous | Large-batch data import; tasks that need to run in the background | [Broker Load Manual](../import-way/broker-load-manual.md) |
| TVF (Table Value Function) | Synchronous | Small-batch, ad-hoc query imports; works with `INSERT INTO ... SELECT` | Examples in this document |
## Prerequisites
Before importing MinIO data using either method, confirm the following conditions:
- A Doris cluster is deployed and can access the MinIO service normally.
- You have obtained the MinIO endpoint, region, access key, and secret key.
- The CSV/Parquet files to be imported have been uploaded to a MinIO bucket.
:::caution Important: MinIO Connection Configuration Notes
When using S3 Load or TVF to import MinIO data, note the following two points:
- **Endpoint protocol prefix**: If MinIO is deployed on a local network without TLS enabled, you need to explicitly add `http://` to the `endpoint`, for example `"s3.endpoint" = "http://localhost:9000"`.
- **Path access style**: The S3 SDK uses virtual-hosted style by default, but MinIO does not enable this access mode by default. Add `"use_path_style" = "true"` to force path style.
:::
## Method 1: Import Using S3 Load (Asynchronous)
S3 Load is suitable for importing files from MinIO into Doris as an asynchronous task. For detailed steps, refer to the [Broker Load Manual](../import-way/broker-load-manual.md).
### Step 1: Prepare the Data
Create a CSV file `s3load_example.csv` and upload it to MinIO with the following content:
```text
1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64
```
### Step 2: Create a Table in Doris
```sql
CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
### Step 3: Import Data Using S3 Load
Execute the following SQL to submit an S3 Load task:
```sql
LOAD LABEL s3_load_2022_04_01
(
DATA INFILE("s3://your_bucket_name/s3load_example.csv")
INTO TABLE test_s3load
COLUMNS TERMINATED BY ","
FORMAT AS "CSV"
(user_id, name, age)
)
WITH S3
(
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"use_path_style" = "true"
)
PROPERTIES
(
"timeout" = "3600"
);
```
### Step 4: Verify the Imported Data
Run a query to verify whether the data has been imported successfully:
```sql
SELECT * FROM test_s3load;
```
Expected output:
```text
mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)
```
## Method 2: Import Using TVF (Synchronous)
The TVF (Table Value Function) method reads MinIO files as a virtual table through the `S3()` function, and combined with `INSERT INTO ... SELECT` it completes the import synchronously. It is suitable for small-batch or ad-hoc scenarios.
### Step 1: Prepare the Data
Create a CSV file `s3load_example.csv` and upload it to MinIO with the following content:
```text
1,Emily,25
2,Benjamin,35
3,Olivia,28
4,Alexander,60
5,Ava,17
6,William,69
7,Sophia,32
8,James,64
9,Emma,37
10,Liam,64
```
### Step 2: Create a Table in Doris
```sql
CREATE TABLE test_s3load(
user_id BIGINT NOT NULL COMMENT "user id",
name VARCHAR(20) COMMENT "name",
age INT COMMENT "age"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 10;
```
### Step 3: Import Data Using TVF
Execute the following SQL to import the data synchronously:
```sql
INSERT INTO test_s3load
SELECT * FROM S3
(
"uri" = "s3://your_bucket_name/s3load_example.csv",
"format" = "csv",
"provider" = "S3",
"s3.endpoint" = "play.min.io:9000",
"s3.region" = "us-east-1",
"s3.access_key" = "myminioadmin",
"s3.secret_key" = "minio-secret-key-change-me",
"column_separator" = ",",
"csv_schema" = "user_id:int;name:string;age:int",
"use_path_style" = "true"
);
```
### Step 4: Verify the Imported Data
Run a query to verify whether the data has been imported successfully:
```sql
SELECT * FROM test_s3load;
```
Expected output:
```text
mysql> select * from test_s3load;
+---------+-----------+------+
| user_id | name | age |
+---------+-----------+------+
| 5 | Ava | 17 |
| 10 | Liam | 64 |
| 7 | Sophia | 32 |
| 9 | Emma | 37 |
| 1 | Emily | 25 |
| 4 | Alexander | 60 |
| 2 | Benjamin | 35 |
| 3 | Olivia | 28 |
| 6 | William | 69 |
| 8 | James | 64 |
+---------+-----------+------+
10 rows in set (0.04 sec)
```
## Key Parameters
<!-- Knowledge type: Configuration parameters -->
The following parameters must be configured correctly for both S3 Load and TVF:
| Parameter | Description | Example value |
|------|------|--------|
| `provider` | Object storage provider. Set to `S3` when using MinIO. | `S3` |
| `s3.endpoint` | MinIO service address. The `http://` prefix is required when TLS is not enabled. | `http://localhost:9000` |
| `s3.region` | The region where MinIO is deployed. Can be set to any value but must remain consistent. | `us-east-1` |
| `s3.access_key` | MinIO access key ID. | `myminioadmin` |
| `s3.secret_key` | MinIO access key secret. | `minio-secret-key-change-me` |
| `use_path_style` | Whether to use path-style access. Must be set to `true` for MinIO. | `true` |
## FAQ
<!-- Knowledge type: Troubleshooting -->
<!-- Applicable scenario: Troubleshooting / Connection configuration -->
### Q1: How do I choose between S3 Load and TVF?
- **S3 Load**: Executes asynchronously. Suitable for large-batch data imports. After submission, Doris schedules and runs the task in the background, and you can query the task status with `SHOW LOAD`.
- **TVF**: Executes synchronously. Suitable for small-batch, ad-hoc analysis, or scenarios combined with `INSERT INTO ... SELECT` pipelines. Returns results immediately.
### Q2: What should I do if I get endpoint-related errors when connecting to MinIO?
Confirm whether the endpoint has the correct protocol prefix:
- TLS not enabled: Must include `http://`, such as `http://localhost:9000`.
- TLS enabled: Use the `https://` prefix.
### Q3: What should I do if access reports a bucket parsing error or 404?
MinIO does not support virtual-hosted style access by default. You need to explicitly add the following to the import parameters:
```text
"use_path_style" = "true"
```
### Q4: Are other formats such as Parquet/ORC supported?
Yes. Replace `FORMAT AS "CSV"` (or `"format" = "csv"` in TVF) with `parquet`, `orc`, or other corresponding formats. For details, see the [Broker Load Manual](../import-way/broker-load-manual.md).