blob: 4c17f2184fc0ba68f4cc54e882978bd3bfd822f9 [file] [log] [blame] [view]
# JsonPath
> JsonPath transform plugin
## Description
> Support use jsonpath select data
## Options
| name | type | required | default value |
|----------------------|-------|----------|---------------|
| columns | Array | Yes | |
| row_error_handle_way | Enum | No | FAIL |
### common options [string]
Transform plugin common parameters, please refer to [Transform Plugin](common-options.md) for details
### row_error_handle_way [Enum]
This option is used to specify the processing method when an error occurs in the row, the default value is `FAIL`.
- FAIL: When `FAIL` is selected, data format error will block and an exception will be thrown.
- SKIP: When `SKIP` is selected, data format error will skip this row data.
### columns [array]
#### option
| name | type | required | default value |
|-------------------------|--------|----------|---------------|
| src_field | String | Yes | |
| dest_field | String | Yes | |
| path | String | Yes | |
| dest_type | String | No | String |
| column_error_handle_way | Enum | No | |
#### src_field
> the json source field you want to parse
Support SeatunnelDateType
* STRING
* BYTES
* ARRAY
* MAP
* ROW
#### dest_field
> after use jsonpath output field
#### dest_type
> the type of dest field
#### path
> Jsonpath
#### column_error_handle_way [Enum]
This option is used to specify the processing method when an error occurs in the column.
- FAIL: When `FAIL` is selected, data format error will block and an exception will be thrown.
- SKIP: When `SKIP` is selected, data format error will skip this column data.
- SKIP_ROW: When `SKIP_ROW` is selected, data format error will skip this row data.
## Read Json Example
The data read from source is a table like this json:
```json
{
"data": {
"c_string": "this is a string",
"c_boolean": true,
"c_integer": 42,
"c_float": 3.14,
"c_double": 3.14,
"c_decimal": 10.55,
"c_date": "2023-10-29",
"c_datetime": "16:12:43.459",
"c_array":["item1", "item2", "item3"],
"c_map_array": [{"c_string_1":"c_string_1","c_string_2":"c_string_2","c_string_3":"c_string_3"},{"c_string_1":"c_string_1","c_string_2":"c_string_2","c_string_3":"c_string_3"}]
}
}
```
Assuming we want to use JsonPath to extract properties.
```json
transform {
JsonPath {
plugin_input = "fake"
plugin_output = "fake1"
columns = [
{
"src_field" = "data"
"path" = "$.data.c_string"
"dest_field" = "c1_string"
},
{
"src_field" = "data"
"path" = "$.data.c_boolean"
"dest_field" = "c1_boolean"
"dest_type" = "boolean"
},
{
"src_field" = "data"
"path" = "$.data.c_integer"
"dest_field" = "c1_integer"
"dest_type" = "int"
},
{
"src_field" = "data"
"path" = "$.data.c_float"
"dest_field" = "c1_float"
"dest_type" = "float"
},
{
"src_field" = "data"
"path" = "$.data.c_double"
"dest_field" = "c1_double"
"dest_type" = "double"
},
{
"src_field" = "data"
"path" = "$.data.c_decimal"
"dest_field" = "c1_decimal"
"dest_type" = "decimal(4,2)"
},
{
"src_field" = "data"
"path" = "$.data.c_date"
"dest_field" = "c1_date"
"dest_type" = "date"
},
{
"src_field" = "data"
"path" = "$.data.c_datetime"
"dest_field" = "c1_datetime"
"dest_type" = "time"
},
{
"src_field" = "data"
"path" = "$.data.c_array"
"dest_field" = "c1_array"
"dest_type" = "array<string>"
},
{
"src_field" = "data"
"path" = "$.data.c_map_array"
"dest_field" = "c1_map_array"
"dest_type" = "array<map<string, string>>"
}
]
}
}
```
The same result can be achieved with much simpler configuration using batch field extraction with array format:
```hocon
transform {
JsonPath {
plugin_input = "fake"
plugin_output = "fake1"
columns = [
{
"src_field" = "data"
"path" = ["$.data.c_string", "$.data.c_boolean", "$.data.c_integer", "$.data.c_float", "$.data.c_double", "$.data.c_decimal", "$.data.c_date", "$.data.c_datetime", "$.data.c_array", "$.data.c_map_array"]
"dest_field" = ["c1_string", "c1_boolean", "c1_integer", "c1_float", "c1_double", "c1_decimal", "c1_date", "c1_datetime", "c1_array", "c1_map_array"]
"dest_type" = ["string", "boolean", "int", "float", "double", "decimal(4,2)", "date", "time", "array<string>", "array<map<string, string>>"]
}
]
}
}
```
**Important:** When using batch field extraction (multiple paths, dest_fields, and dest_types), the `dest_type` parameter is **required** and cannot be omitted. Each extracted field must have a corresponding type specified. The array format provides better readability and is less error-prone than string-based configurations.
Then the data result table `fake1` will like this
| data | c1_string | c1_boolean | c1_integer | c1_float | c1_double | c1_decimal | c1_date | c1_datetime | c1_array |
|------------------------------|------------------|------------|------------|----------|-----------|------------|------------|--------------|-----------------------------|
| too much content not to show | this is a string | true | 42 | 3.14 | 3.14 | 10.55 | 2023-10-29 | 16:12:43.459 | ["item1", "item2", "item3"] |
## Read SeatunnelRow Example
Suppose a column in a row of data is of type SeatunnelRow and that the name of the column is col
<table>
<tr><th colspan="2">SeatunnelRow(col)</th><th>other</th></tr>
<tr><td>name</td><td>age</td><td>....</td></tr>
<tr><td>a</td><td>18</td><td>....</td></tr>
</table>
The JsonPath transform converts the values of seatunnel into an array,
```hocon
transform {
JsonPath {
plugin_input = "fake"
plugin_output = "fake1"
row_error_handle_way = FAIL
columns = [
{
"src_field" = "col"
"path" = "$[0]"
"dest_field" = "name"
"dest_type" = "string"
},
{
"src_field" = "col"
"path" = "$[1]"
"dest_field" = "age"
"dest_type" = "int"
}
]
}
}
```
Then the data result table `fake1` will like this
| name | age | col | other |
|------|-----|----------|-------|
| a | 18 | ["a",18] | ... |
## Configure error data handle way
You can configure `row_error_handle_way` and `column_error_handle_way` to handle abnormal data. Both are optional.
`row_error_handle_way` is used to handle all data anomalies in the row data, while `column_error_handle_way` is used to handle data anomalies in a column. It has a higher priority than `row_error_handle_way`.
### Skip error data rows
Configure to skip row data with exceptions in any column
```hocon
transform {
JsonPath {
row_error_handle_way = SKIP
columns = [
{
"src_field" = "json_data"
"path" = "$.f1"
"dest_field" = "json_data_f1"
},
{
"src_field" = "json_data"
"path" = "$.f2"
"dest_field" = "json_data_f2"
}
]
}
}
```
### Skip error data column
Configure only `json_data_f1` column data exceptions to skip and fill in null values, other column data exceptions will continue to throw exception interrupt handlers
```hocon
transform {
JsonPath {
row_error_handle_way = FAIL
columns = [
{
"src_field" = "json_data"
"path" = "$.f1"
"dest_field" = "json_data_f1"
"column_error_handle_way" = "SKIP"
},
{
"src_field" = "json_data"
"path" = "$.f2"
"dest_field" = "json_data_f2"
}
]
}
}
```
### Skip the row for specified column error
Configure to skip the row of data only for `json_data_f1` column data exceptions, and continue to throw exceptions to interrupt the handler for other column data exceptions
```hocon
transform {
JsonPath {
row_error_handle_way = FAIL
columns = [
{
"src_field" = "json_data"
"path" = "$.f1"
"dest_field" = "json_data_f1"
"column_error_handle_way" = "SKIP_ROW"
},
{
"src_field" = "json_data"
"path" = "$.f2"
"dest_field" = "json_data_f2"
}
]
}
}
```
## Changelog
* Add JsonPath Transform