Json transform plugin
Json analysis of the specified fields of the original data set
:::tip
This transform ONLY supported by Spark.
:::
| name | type | required | default value |
|---|---|---|---|
| source_field | string | no | raw_message |
| target_field | string | no | root |
| schema_dir | string | no | - |
| schema_file | string | no | - |
| common-options | string | no | - |
Source field, if not configured, the default is raw_message
The target field, if it is not configured, the default is __root__ , and the result of Json parsing will be uniformly placed at the top of the Dataframe
Style directory, if not configured, the default is $seatunnelRoot/plugins/json/files/schemas/
The style file name, if it is not configured, the default is empty, that is, the structure is not specified, and the system derives it by itself according to the input of the data source.
Transform plugin common parameters, please refer to Transform Plugin for details
json schema usage scenariosThe multiple data sources of a single task may contain different styles of json data. For example, the topicA style from Kafka is
{ "A": "a_val", "B": "b_val" }
The style from topicB is
{ "C": "c_val", "D": "d_val" }
When running Transform , you need to fuse the data of topicA and topicB into a wide table for calculation. You can specify a schema whose content style is:
{ "A": "a_val", "B": "b_val", "C": "c_val", "D": "d_val" }
Then the fusion output result of topicA and topicB is:
+-----+-----+-----+-----+ |A |B |C |D | +-----+-----+-----+-----+ |a_val|b_val|null |null | |null |null |c_val|d_val| +-----+-----+-----+-----+
target_fieldjson { source_field = "message" }
+----------------------------+ |message | +----------------------------+ |{"name": "ricky", "age": 24}| |{"name": "gary", "age": 28} | +----------------------------+
+----------------------------+---+-----+ |message |age|name | +----------------------------+---+-----+ |{"name": "gary", "age": 28} |28 |gary | |{"name": "ricky", "age": 23}|23 |ricky| +----------------------------+---+-----+
target_fieldjson { source_field = "message" target_field = "info" }
+----------------------------+ |message | +----------------------------+ |{"name": "ricky", "age": 24}| |{"name": "gary", "age": 28} | +----------------------------+
+----------------------------+----------+ |message |info | +----------------------------+----------+ |{"name": "gary", "age": 28} |[28,gary] | |{"name": "ricky", "age": 23}|[23,ricky]| +----------------------------+----------+
The results of json processing support
select * from where info.age = 23such SQL statements
schema_filejson { source_field = "message" schema_file = "demo.json" }
Place the following content in ~/seatunnel/plugins/json/files/schemas/demo.json of Driver Node:
{ "name": "demo", "age": 24, "city": "LA" }
+----------------------------+ |message | +----------------------------+ |{"name": "ricky", "age": 24}| |{"name": "gary", "age": 28} | +----------------------------+
+----------------------------+---+-----+-----+ |message |age|name |city | +----------------------------+---+-----+-----+ |{"name": "gary", "age": 28} |28 |gary |null | |{"name": "ricky", "age": 23}|23 |ricky|null | +----------------------------+---+-----+-----+
If you use
cluster modefor deployment, make sure that thejson schemasdirectory is packaged inplugins.tar.gz