This document explains how to load JSON format data files into Doris. Doris supports loading standard JSON format data and can flexibly handle different JSON data structures through parameter configuration, supporting field extraction from JSON data and handling nested structures.
The following loading methods support JSON format data:
Doris supports the following three JSON formats:
Suitable for batch loading multiple rows of data, requirements:
strip_outer_array=trueExample data:
[ {"id": 123, "city": "beijing"}, {"id": 456, "city": "shanghai"} ] // Supports nested structures [ {"id": 123, "city": {"name": "beijing", "region": "haidian"}}, {"id": 456, "city": {"name": "beijing", "region": "chaoyang"}} ]
Suitable for loading single row data, requirements:
Example data:
{"id": 123, "city": "beijing"} // Supports nested structures {"id": 123, "city": {"name": "beijing", "region": "haidian"}}
:::tip Note Typically used with Routine Load method, such as single messages in Kafka. :::
Suitable for batch loading multiple rows of data, requirements:
read_json_by_line=trueline_delimiter parameter, default is \nExample data:
{"id": 123, "city": "beijing"} {"id": 456, "city": "shanghai"}
The following table lists the JSON format parameters supported by various loading methods:
| Parameter | Default Value | Stream Load | Broker Load | Routine Load | TVF |
|---|---|---|---|---|---|
| json paths | None | supported | supported | supported | supported |
| json root | None | supported | supported | supported | supported |
| strip outer array | false | supported | supported | supported | supported |
| read json by line | true | supported | not supported | not supported | supported |
| fuzzy parse | false | supported | supported | not supported | supported |
| num as string | false | supported | supported | supported | supported |
| compression format | plain | supported | supported | not supported | supported |
:::tip Note
-H "jsonpaths: $.data"PROPERTIES, e.g., PROPERTIES("jsonpaths"="$.data")PROPERTIES, e.g., PROPERTIES("jsonpaths"="$.data")S3("jsonpaths"="$.data")PROPERTIES("jsonpaths"="$.")-- Basic usage ["$.id", "$.city"] -- Nested structures ["$.id", "$.info.city", "$.data[0].name"]
-- Original data { "data": { "id": 123, "city": "beijing" } } -- Set json_root json_root = $.data
-- Original data [ {"id": 1, "city": "beijing"}, {"id": 2, "city": "shanghai"} ] -- Set strip_outer_array=true
-- Original data (one complete JSON object per line) {"id": 1, "city": "beijing"} {"id": 2, "city": "shanghai"} -- Set read_json_by_line=true
-- Original data { "id": "12345678901234567890", "price": "99999999.999999" } -- Set num_as_string=true, price field will be parsed as string
During data loading, JSON Path and Columns serve different responsibilities:
JSON Path: Defines data extraction rules
Columns: Defines data mapping rules
These two parameters are processed serially: first, JSON Path extracts fields from source data and forms an ordered dataset, then Columns maps these data to table columns. If Columns is not specified, extracted fields will be mapped directly according to table column order.
Table structure and data:
-- Table structure CREATE TABLE example_table ( k2 int, k1 int ); -- JSON data {"k1": 1, "k2": 2}
Load command:
curl -v ... -H "format: json" \ -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+ | k1 | k2 | +------+------+ | 2 | 1 | +------+------+
Using the same table structure and data, adding columns parameter:
Load command:
curl -v ... -H "format: json" \ -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \ -H "columns: k2, k1" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+ | k1 | k2 | +------+------+ | 1 | 2 | +------+------+
Table structure and data:
-- Table structure CREATE TABLE example_table ( k2 int, k1 int, k1_copy int ); -- JSON data {"k1": 1, "k2": 2}
Load command:
curl -v ... -H "format: json" \ -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k1\"]" \ -H "columns: k2, k1, k1_copy" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+---------+ | k2 | k1 | k1_copy | +------+------+---------+ | 2 | 1 | 1 | +------+------+---------+
Table structure and data:
-- Table structure CREATE TABLE example_table ( k2 int, k1 int, k1_nested1 int, k1_nested2 int ); -- JSON data { "k1": 1, "k2": 2, "k3": { "k1": 31, "k1_nested": { "k1": 32 } } }
Load command:
curl -v ... -H "format: json" \ -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k3.k1\", \"$.k3.k1_nested.k1\"]" \ -H "columns: k2, k1, k1_nested1, k1_nested2" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+------------+------------+ | k2 | k1 | k1_nested1 | k1_nested2 | +------+------+------------+------------+ | 2 | 1 | 31 | 32 | +------+------+------------+------------+
This section demonstrates the usage of JSON format in different loading methods.
# Using JSON Path curl --location-trusted -u <user>:<passwd> \ -H "format: json" \ -H "jsonpaths: [\"$.id\", \"$.city\"]" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load # Specifying JSON root curl --location-trusted -u <user>:<passwd> \ -H "format: json" \ -H "json_root: $.events" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load # Reading JSON by line curl --location-trusted -u <user>:<passwd> \ -H "format: json" \ -H "read_json_by_line: true" \ -T example.json \ http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load
-- Using JSON Path LOAD LABEL example_db.example_label ( DATA INFILE("s3://bucket/path/example.json") INTO TABLE example_table FORMAT AS "json" PROPERTIES ( "jsonpaths" = "[\"$.id\", \"$.city\"]" ) ) WITH S3 ( ... ); -- Specifying JSON root LOAD LABEL example_db.example_label ( DATA INFILE("s3://bucket/path/example.json") INTO TABLE example_table FORMAT AS "json" PROPERTIES ( "json_root" = "$.events" ) ) WITH S3 ( ... );
-- Using JSON Path CREATE ROUTINE LOAD example_db.example_job ON example_table PROPERTIES ( "format" = "json", "jsonpaths" = "[\"$.id\", \"$.city\"]" ) FROM KAFKA ( ... );
-- Using JSON Path INSERT INTO example_table SELECT * FROM S3 ( "path" = "s3://bucket/example.json", "format" = "json", "jsonpaths" = "[\"$.id\", \"$.city\"]", ... ); -- Specifying JSON root INSERT INTO example_table SELECT * FROM S3 ( "path" = "s3://bucket/example.json", "format" = "json", "json_root" = "$.events", ... );