This document describes how to load Native format data files in Apache Doris. Native is a binary data format dedicated to Doris, suitable as an internal data exchange and backup format rather than a general-purpose file exchange format. When data flows only within Doris, prefer the Native format to achieve the highest load efficiency.
This feature is supported since version 4.1.0.
The Native format mainly targets the following scenarios:
Tip: If you need to exchange data with external systems, use general-purpose formats such as CSV, JSON, Parquet, or ORC instead of Native.
The following table lists the load methods that support the Native format and their typical uses:
| Load Method | Typical Use | Documentation Link |
|---|---|---|
| Stream Load | Push local Native files over HTTP | Stream Load |
| Broker Load | Asynchronously load Native files from object storage / HDFS | Broker Load |
| INSERT INTO FROM S3 TVF | Read Native files directly from S3 via SQL | S3 TVF |
| INSERT INTO FROM HDFS TVF | Read Native files directly from HDFS via SQL | HDFS TVF |
The following examples show how to use the Native format with different load methods. Choose the appropriate method based on the data source (local file / object storage / HDFS) and load mode (synchronous / asynchronous).
Applicable scenario: A Native file resides on the local machine or on a server that can access the FE HTTP port, and you need a fast synchronous load.
Steps:
example.native.curl to push the file through the Stream Load interface, and specify the format with the request header format: native.curl --location-trusted -u <user>:<passwd> \ -H "format: native" \ -T example.native \ http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load
Applicable scenario: The Native file is stored in remote storage such as S3 (or an S3-compatible object store), and you need an asynchronous batch load.
Key points:
DATA INFILE.FORMAT AS "native".WITH S3.LOAD LABEL example_db.example_label ( DATA INFILE("s3://bucket/example.native") INTO TABLE example_table FORMAT AS "native" ) WITH S3 ( ... );
Applicable scenario: You want to read remote Native files directly with SQL and write them into a target table, making it easy to combine with query, filter, and transformation logic.
Key points:
uri and format = "native" in the TVF parameters.INSERT INTO ... SELECT to write the read result into the target table.INSERT INTO example_table SELECT * FROM S3 ( "uri" = "s3://bucket/example.native", "format" = "native", ... );
Q1: Can the Native format be used to exchange data with external systems?
Not recommended. Native is a binary format dedicated to Doris and is not compatible with external systems. For cross-system data exchange, prefer general-purpose formats such as CSV, JSON, Parquet, or ORC.
Q2: Why is the Native format recommended for data flow within Doris?
The Native format aligns with Doris internal data structures, so serialization and deserialization overhead is minimal. As a result, it delivers the highest load efficiency between Doris clusters or in backup scenarios.
Q3: Which load methods support the Native format?
Stream Load, Broker Load, and INSERT INTO ... FROM S3 / HDFS TVF are currently supported. See the “Supported Load Methods” section above for details.