docs/data-operate/import/data-source/bigquery.md - doris-website - Git at Google

 ---
 {
     "title": "BigQuery",
     "language": "en",
     "description": "A complete guide to migrating BigQuery data into Apache Doris: staging through GCS, importing with S3 Load, including data type mapping, table creation, troubleshooting, and batch import practices.",
     "keywords": [
         "BigQuery migration to Doris",
         "BigQuery import to Apache Doris",
         "GCS S3 Load",
         "BigQuery Export Parquet",
         "Doris data migration",
         "ETL_QUALITY_UNSATISFIED"
     ]
 }
 ---

 <!-- Knowledge type: Procedure -->
 <!-- Applicable scenario: Data migration / BigQuery to Doris -->

 This document describes how to migrate data from Google BigQuery into Apache Doris. The overall workflow uses object storage as an intermediate medium:

 1. Use BigQuery's [Export](https://cloud.google.com/bigquery/docs/exporting-data) statement to export data to GCS (Google Cloud Storage).
 2. Create the corresponding target table in Doris.
 3. Use Doris's S3 Load feature to pull the data from GCS into Doris. For details on the mechanism, see [S3 Load](./amazon-s3.md).

 ## Notes

 Before starting the migration, pay attention to the following items to avoid common issues:

 - **Table schema design**: Choose the appropriate Doris [data model](../../../table-design/data-model/intro.mdx), as well as [partitioning](../../../table-design/data-partitioning/dynamic-partitioning.md) and [bucketing](../../../table-design/data-partitioning/data-bucketing.md) strategies based on the BigQuery table schema. For more table creation strategies, see [Load best practices](../load-best-practices/load-best-practices.md).
 - **JSON type export**: BigQuery does not support exporting JSON types in Parquet format. Use JSON format for the export instead.
 - **Time type export**: When BigQuery exports a Time type, cast it to String before exporting.
 - **Complex type import**: For Parquet/ORC files containing complex types (Struct/Array/Map), you must currently use [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

 ## Data type mapping

 <!-- Knowledge type: Configuration parameter -->

 Before migrating, create columns in Doris that correspond to the BigQuery fields according to the following rules:

 | BigQuery           | Doris          | Notes                              |
 | ------------------ | -------------- | ---------------------------------- |
 | Array              | Array          |                                    |
 | BOOLEAN            | BOOLEAN        |                                    |
 | DATE               | DATE           |                                    |
 | DATETIME/TIMESTAMP | DATETIME       |                                    |
 | JSON               | JSON           |                                    |
 | INT64              | BIGINT         |                                    |
 | NUMERIC            | DECIMAL        |                                    |
 | FLOAT64            | DOUBLE         |                                    |
 | STRING             | VARCHAR/STRING | VARCHAR maximum length is 65535    |
 | STRUCT             | STRUCT         |                                    |
 | TIME               | STRING         | Cast to String before exporting    |
 | OTHER              | UNSUPPORTED    | Types not currently supported      |

 ## Migration steps

 ### Step 1: Create the target table in Doris

 Before migrating a BigQuery table to Doris, first create a Doris table with a matching schema.

 Assume the following table and sample data already exist in BigQuery:

 ```sql
 CREATE OR REPLACE TABLE test.sales_data (
     order_id      INT64,
     customer_name STRING,
     order_date    DATE,
     amount        NUMERIC(10,2),
     country       STRING
 )
 PARTITION BY  order_date;


 INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, country) VALUES
 (1, 'Alice', '2025-04-08', 99.99, 'USA'),
 (2, 'Bob', '2025-04-08', 149.50, 'Canada'),
 (3, 'Charlie', '2025-04-09', 75.00, 'UK'),
 (4, 'Diana', '2025-04-10', 200.00, 'Australia');
 ```

 Based on this schema, you can create a unique key partitioned table in Doris. The partition column matches the BigQuery partition column, and the table is partitioned by day:

 ```sql
 CREATE TABLE `sales_data` (
   order_id      INT,
   order_date    DATE NOT NULL,
   customer_name VARCHAR(128),
   amount        DECIMAL(10,2),
   country       VARCHAR(48)
 ) ENGINE=OLAP
 UNIQUE KEY(`order_id`,`order_date`)
 PARTITION BY RANGE(`order_date`) (
 PARTITION p20250408 VALUES [('2025-04-08'), ('2025-04-09')),
 PARTITION p20250409 VALUES [('2025-04-09'), ('2025-04-10')),
 PARTITION p20250410 VALUES [('2025-04-10'), ('2025-04-11'))
 )
 DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
 PROPERTIES (
  "dynamic_partition.enable" = "true",
  "dynamic_partition.time_unit" = "DAY",
  "dynamic_partition.end" = "5",
  "dynamic_partition.prefix" = "p",
  "dynamic_partition.buckets" = "16",
  "replication_num" = "1"
 );
 ```

 ### Step 2: Export BigQuery data to GCS

 #### 2.1 Export Parquet files with the Export statement

 Use BigQuery's `EXPORT DATA` to export the target table to GCS:

 ```sql
 EXPORT DATA
   OPTIONS (
     uri = 'gs://mybucket/export/sales_data/*.parquet',
     format = 'PARQUET')
 AS (
   SELECT *
   FROM test.sales_data
 );
 ```

 #### 2.2 Inspect the exported files on GCS

 The command above exports `sales_data` to GCS. Each partition produces one or more files with incrementing file names. For details, see [exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files).

 ![gcs_export](/images/data-operate/gcs_export.png)

 ### Step 3: Import data into Doris with S3 Load

 The import uses S3 Load. **S3 Load is an asynchronous data import method. After the job is submitted, Doris actively pulls data from the data source.** The data source supports object storage compatible with the S3 protocol, including [AWS S3](./amazon-s3.md), [GCS](./google-cloud-storage.md), and [Azure](./azure-storage.md).

 This method is suitable for scenarios with large data volumes that need asynchronous background processing. For data imports that need synchronous processing, see [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

 > **Note**: For Parquet/ORC files containing complex types (Struct/Array/Map), you must currently use TVF Load.

 #### 3.1 Import data from a single file

 Use the following `LOAD LABEL` statement to import a single Parquet file from GCS:

 ```sql
 LOAD LABEL sales_data_2025_04_08
 (
     DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
     INTO TABLE sales_data
     FORMAT AS "parquet"
     (order_id, order_date, customer_name, amount, country)
 )
 WITH S3
 (
     "provider" = "GCP",
     "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",
     "s3.region" = "asia-southeast1",
     "s3.access_key" = "<ak>",
     "s3.secret_key" = "<sk>"
 );
 ```

 #### 3.2 Check job status with Show Load

 Because S3 Load is submitted asynchronously, use `SHOW LOAD` to query the import status for a given label:

 ```yaml
 mysql> show load where label = "sales_data_2025_04_08"\G
 *************************** 1. row ***************************
         JobId: 17956078
         Label: sales_data_2025_04_08
         State: FINISHED
       Progress: 100.00% (1/1)
           Type: BROKER
       EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
       TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
       ErrorMsg: NULL
     CreateTime: 2025-04-10 17:50:53
   EtlStartTime: 2025-04-10 17:50:54
 EtlFinishTime: 2025-04-10 17:50:54
 LoadStartTime: 2025-04-10 17:50:54
 LoadFinishTime: 2025-04-10 17:50:54
           URL: NULL
     JobDetails: {"Unfinished backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
 TransactionId: 766228
   ErrorTablets: {}
           User: root
       Comment:
 1 row in set (0.00 sec)
 ```

 #### 3.3 Handle errors during import

 <!-- Knowledge type: Troubleshooting -->

 When multiple import jobs exist, use the following statement to query failed imports and their causes:

 ```yaml
 mysql> show load where state='CANCELLED' and label like "label_test%"\G
 *************************** 1. row ***************************
         JobId: 18312384
         Label: label_test123
         State: CANCELLED
       Progress: 100.00% (3/3)
           Type: BROKER
       EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
       TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
       ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel
     CreateTime: 2025-04-15 17:32:59
   EtlStartTime: 2025-04-15 17:33:02
 EtlFinishTime: 2025-04-15 17:33:02
 LoadStartTime: 2025-04-15 17:33:02
 LoadFinishTime: 2025-04-15 17:33:02
           URL: http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
     JobDetails: {"Unfinished backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
 TransactionId: 769213
   ErrorTablets: {}
           User: root
       Comment:
 ```

 The example above is a **data quality error** (`ETL_QUALITY_UNSATISFIED`). To get the specific cause, visit the returned `URL`. For example, the error below indicates that the data exceeds the actual length of the `country` column in the table schema:

 ```python
 [root@VM-10-6-centos ~]$ curl "http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
 Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src line [];
 Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src line [];
 Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src line [];
 Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . src line [];
 ```

 For data quality errors, if skipping some erroneous rows is acceptable, set the error tolerance ratio in the S3 Load `PROPERTIES`. For details on the parameters, see [Load configuration parameters](../../import/import-way/broker-load-manual.md#related-configurations).

 #### 3.4 Import data from multiple files

 When migrating large volumes of historical data, use a batched import strategy:

 - **Batch division**: Each batch corresponds to one Doris partition or a small number of partitions.
 - **Per-batch data volume**: No more than 100 GB per batch is recommended, to reduce system pressure and lower the cost of retries after a failed import.
 - **Batch script**: See [s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh). This script splits the file list under a specified directory in object storage and submits multiple S3 Load jobs to Doris in batches, enabling batch import.

 ## FAQ

 <!-- Knowledge type: Troubleshooting -->

 **Q1: Why can JSON types in BigQuery not be exported directly as Parquet?**

 BigQuery does not currently support exporting JSON types in Parquet format. Set `format` to `JSON` for the export, and use a corresponding JSON column in Doris to receive the data.

 **Q2: How does the BigQuery Time type map to Doris?**

 The BigQuery Time type must be cast to String during export, and stored in a STRING column in Doris.

 **Q3: What should I do when an import fails with `ETL_QUALITY_UNSATISFIED`?**

 This error indicates that the data does not meet quality requirements. Handle it as follows:

 1. Look at the `URL` field in the `SHOW LOAD` result and use `curl` to fetch the detailed error log.
 2. Identify the cause from the log (common causes include column length overflow and type mismatch).
 3. Adjust the target table schema, or set `max_filter_ratio` in the S3 Load `PROPERTIES` to skip some erroneous rows.

 **Q4: Can Parquet/ORC files containing Struct/Array/Map be imported with S3 Load?**

 No. Files with these complex types must currently be imported with [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

 **Q5: How can I avoid the retry cost caused by single-job failures when migrating large volumes of historical data?**

 Import in batches by partition, with no more than 100 GB per batch, and use the [s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh) script to split and submit batches.
	---
	{
	"title": "BigQuery",
	"language": "en",
	"description": "A complete guide to migrating BigQuery data into Apache Doris: staging through GCS, importing with S3 Load, including data type mapping, table creation, troubleshooting, and batch import practices.",
	"keywords": [
	"BigQuery migration to Doris",
	"BigQuery import to Apache Doris",
	"GCS S3 Load",
	"BigQuery Export Parquet",
	"Doris data migration",
	"ETL_QUALITY_UNSATISFIED"
	]
	}
	---

	<!-- Knowledge type: Procedure -->
	<!-- Applicable scenario: Data migration / BigQuery to Doris -->

	This document describes how to migrate data from Google BigQuery into Apache Doris. The overall workflow uses object storage as an intermediate medium:

	1. Use BigQuery's [Export](https://cloud.google.com/bigquery/docs/exporting-data) statement to export data to GCS (Google Cloud Storage).
	2. Create the corresponding target table in Doris.
	3. Use Doris's S3 Load feature to pull the data from GCS into Doris. For details on the mechanism, see [S3 Load](./amazon-s3.md).

	## Notes

	Before starting the migration, pay attention to the following items to avoid common issues:

	- Table schema design: Choose the appropriate Doris [data model](../../../table-design/data-model/intro.mdx), as well as [partitioning](../../../table-design/data-partitioning/dynamic-partitioning.md) and [bucketing](../../../table-design/data-partitioning/data-bucketing.md) strategies based on the BigQuery table schema. For more table creation strategies, see [Load best practices](../load-best-practices/load-best-practices.md).
	- JSON type export: BigQuery does not support exporting JSON types in Parquet format. Use JSON format for the export instead.
	- Time type export: When BigQuery exports a Time type, cast it to String before exporting.
	- Complex type import: For Parquet/ORC files containing complex types (Struct/Array/Map), you must currently use [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

	## Data type mapping

	<!-- Knowledge type: Configuration parameter -->

	Before migrating, create columns in Doris that correspond to the BigQuery fields according to the following rules:

	\| BigQuery \| Doris \| Notes \|
	\| ------------------ \| -------------- \| ---------------------------------- \|
	\| Array \| Array \| \|
	\| BOOLEAN \| BOOLEAN \| \|
	\| DATE \| DATE \| \|
	\| DATETIME/TIMESTAMP \| DATETIME \| \|
	\| JSON \| JSON \| \|
	\| INT64 \| BIGINT \| \|
	\| NUMERIC \| DECIMAL \| \|
	\| FLOAT64 \| DOUBLE \| \|
	\| STRING \| VARCHAR/STRING \| VARCHAR maximum length is 65535 \|
	\| STRUCT \| STRUCT \| \|
	\| TIME \| STRING \| Cast to String before exporting \|
	\| OTHER \| UNSUPPORTED \| Types not currently supported \|

	## Migration steps

	### Step 1: Create the target table in Doris

	Before migrating a BigQuery table to Doris, first create a Doris table with a matching schema.

	Assume the following table and sample data already exist in BigQuery:

	```sql
	CREATE OR REPLACE TABLE test.sales_data (
	order_id INT64,
	customer_name STRING,
	order_date DATE,
	amount NUMERIC(10,2),
	country STRING
	)
	PARTITION BY order_date;


	INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, country) VALUES
	(1, 'Alice', '2025-04-08', 99.99, 'USA'),
	(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
	(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
	(4, 'Diana', '2025-04-10', 200.00, 'Australia');
	```

	Based on this schema, you can create a unique key partitioned table in Doris. The partition column matches the BigQuery partition column, and the table is partitioned by day:

	```sql
	CREATE TABLE `sales_data` (
	order_id INT,
	order_date DATE NOT NULL,
	customer_name VARCHAR(128),
	amount DECIMAL(10,2),
	country VARCHAR(48)
	) ENGINE=OLAP
	UNIQUE KEY(`order_id`,`order_date`)
	PARTITION BY RANGE(`order_date`) (
	PARTITION p20250408 VALUES [('2025-04-08'), ('2025-04-09')),
	PARTITION p20250409 VALUES [('2025-04-09'), ('2025-04-10')),
	PARTITION p20250410 VALUES [('2025-04-10'), ('2025-04-11'))
	)
	DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
	PROPERTIES (
	"dynamic_partition.enable" = "true",
	"dynamic_partition.time_unit" = "DAY",
	"dynamic_partition.end" = "5",
	"dynamic_partition.prefix" = "p",
	"dynamic_partition.buckets" = "16",
	"replication_num" = "1"
	);
	```

	### Step 2: Export BigQuery data to GCS

	#### 2.1 Export Parquet files with the Export statement

	Use BigQuery's `EXPORT DATA` to export the target table to GCS:

	```sql
	EXPORT DATA
	OPTIONS (
	uri = 'gs://mybucket/export/sales_data/*.parquet',
	format = 'PARQUET')
	AS (
	SELECT *
	FROM test.sales_data
	);
	```

	#### 2.2 Inspect the exported files on GCS

	The command above exports `sales_data` to GCS. Each partition produces one or more files with incrementing file names. For details, see [exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files).

	![gcs_export](/images/data-operate/gcs_export.png)

	### Step 3: Import data into Doris with S3 Load

	The import uses S3 Load. S3 Load is an asynchronous data import method. After the job is submitted, Doris actively pulls data from the data source. The data source supports object storage compatible with the S3 protocol, including [AWS S3](./amazon-s3.md), [GCS](./google-cloud-storage.md), and [Azure](./azure-storage.md).

	This method is suitable for scenarios with large data volumes that need asynchronous background processing. For data imports that need synchronous processing, see [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

	> Note: For Parquet/ORC files containing complex types (Struct/Array/Map), you must currently use TVF Load.

	#### 3.1 Import data from a single file

	Use the following `LOAD LABEL` statement to import a single Parquet file from GCS:

	```sql
	LOAD LABEL sales_data_2025_04_08
	(
	DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
	INTO TABLE sales_data
	FORMAT AS "parquet"
	(order_id, order_date, customer_name, amount, country)
	)
	WITH S3
	(
	"provider" = "GCP",
	"s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",
	"s3.region" = "asia-southeast1",
	"s3.access_key" = "<ak>",
	"s3.secret_key" = "<sk>"
	);
	```

	#### 3.2 Check job status with Show Load

	Because S3 Load is submitted asynchronously, use `SHOW LOAD` to query the import status for a given label:

	```yaml
	mysql> show load where label = "sales_data_2025_04_08"\G
	************************* 1. row *************************
	JobId: 17956078
	Label: sales_data_2025_04_08
	State: FINISHED
	Progress: 100.00% (1/1)
	Type: BROKER
	EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
	TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
	ErrorMsg: NULL
	CreateTime: 2025-04-10 17:50:53
	EtlStartTime: 2025-04-10 17:50:54
	EtlFinishTime: 2025-04-10 17:50:54
	LoadStartTime: 2025-04-10 17:50:54
	LoadFinishTime: 2025-04-10 17:50:54
	URL: NULL
	JobDetails: {"Unfinished backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
	TransactionId: 766228
	ErrorTablets: {}
	User: root
	Comment:
	1 row in set (0.00 sec)
	```

	#### 3.3 Handle errors during import

	<!-- Knowledge type: Troubleshooting -->

	When multiple import jobs exist, use the following statement to query failed imports and their causes:

	```yaml
	mysql> show load where state='CANCELLED' and label like "label_test%"\G
	************************* 1. row *************************
	JobId: 18312384
	Label: label_test123
	State: CANCELLED
	Progress: 100.00% (3/3)
	Type: BROKER
	EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
	TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
	ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel
	CreateTime: 2025-04-15 17:32:59
	EtlStartTime: 2025-04-15 17:33:02
	EtlFinishTime: 2025-04-15 17:33:02
	LoadStartTime: 2025-04-15 17:33:02
	LoadFinishTime: 2025-04-15 17:33:02
	URL: http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
	JobDetails: {"Unfinished backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
	TransactionId: 769213
	ErrorTablets: {}
	User: root
	Comment:
	```

	The example above is a data quality error (`ETL_QUALITY_UNSATISFIED`). To get the specific cause, visit the returned `URL`. For example, the error below indicates that the data exceeds the actual length of the `country` column in the table schema:

	```python
	[root@VM-10-6-centos ~]$ curl "http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342"
	Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src line [];
	Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src line [];
	Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src line [];
	Reason: column_name[country], the length of input is too long than schema. first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . src line [];
	```

	For data quality errors, if skipping some erroneous rows is acceptable, set the error tolerance ratio in the S3 Load `PROPERTIES`. For details on the parameters, see [Load configuration parameters](../../import/import-way/broker-load-manual.md#related-configurations).

	#### 3.4 Import data from multiple files

	When migrating large volumes of historical data, use a batched import strategy:

	- Batch division: Each batch corresponds to one Doris partition or a small number of partitions.
	- Per-batch data volume: No more than 100 GB per batch is recommended, to reduce system pressure and lower the cost of retries after a failed import.
	- Batch script: See [s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh). This script splits the file list under a specified directory in object storage and submits multiple S3 Load jobs to Doris in batches, enabling batch import.

	## FAQ

	<!-- Knowledge type: Troubleshooting -->

	Q1: Why can JSON types in BigQuery not be exported directly as Parquet?

	BigQuery does not currently support exporting JSON types in Parquet format. Set `format` to `JSON` for the export, and use a corresponding JSON column in Doris to receive the data.

	Q2: How does the BigQuery Time type map to Doris?

	The BigQuery Time type must be cast to String during export, and stored in a STRING column in Doris.

	Q3: What should I do when an import fails with `ETL_QUALITY_UNSATISFIED`?

	This error indicates that the data does not meet quality requirements. Handle it as follows:

	1. Look at the `URL` field in the `SHOW LOAD` result and use `curl` to fetch the detailed error log.
	2. Identify the cause from the log (common causes include column length overflow and type mismatch).
	3. Adjust the target table schema, or set `max_filter_ratio` in the S3 Load `PROPERTIES` to skip some erroneous rows.

	Q4: Can Parquet/ORC files containing Struct/Array/Map be imported with S3 Load?

	No. Files with these complex types must currently be imported with [TVF Load](./amazon-s3.md#method-2-load-with-tvf-synchronous).

	Q5: How can I avoid the retry cost caused by single-job failures when migrating large volumes of historical data?

	Import in batches by partition, with no more than 100 GB per batch, and use the [s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh) script to split and submit batches.