blob: bf61a7116964cc2453ae4647e519cc32e350985a [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
:documentationPath: /plugins/transforms/
:language: en_US
:page-alternativeEditUrl: https://github.com/apache/incubator-hop/edit/master/plugins/transforms/jsoninput/src/main/doc/jsoninput.adoc
= JSON Input
== Description
This transform reads data from JSON structures, files, or incoming fields using a JSONPath expression to extract data and output rows. JSONPath expressions can use either dot notation or square bracket notation.
== Options
=== General Tab
[width="90%", options="header"]
|===
|Option|Description
|Source is from a previous transform a|Select to retrieve the source from a previously defined field. When selected, the following fields are available:
* Select field
* Use field as file names
* Read source as URL
* Do not pass field downstream
When this option is cleared, the following fields are available:
* File or directory
* Regular Expression
* Exclude Regular Expression
* Selected files
|Select field|Specify the field name to use as a source from a previous transform.
|Use field as file names|Select to indicate the source is a filename.
|Read source as URL|Select to indicate if the source should be accessed as a URL.
|Do not pass field downstream|Select to remove the source field from the output stream. This action improves performance and memory utilization with large JSON fields.
|File or directory|Specify the source location if the source is not defined in a field. |Click Browse to navigate to your source file or directory. Click Add to include the source in the Selected files table.
|Regular expression|Specify a regular expression to match filenames within a specified directory.
|Exclude regular expression|Specify a regular expression to exclude filenames within a specified directory.
|File/Directory|The source location indicated by clicking Add after specifying it in File or directory.
|Wildcard (RegExp)|Wildcards as specified in Regular expression.
|Exclude wildcard|Excluded wildcards as specified in Exclude regular expression.
|Required|Required source location for input.
|Include subfolders|Whether subfolders are included within the source location.
|Delete|Remove a source from the table
|Edit|Remove a source from the table and return it back to the File or directory option.
|Show filename(s)|Display the file names of the sources successfully connected to the JSON Input transform.
|===
=== Content Tab
The Content tab contains the following options for configuring which data to retrieve:
[width="90%", options="header"]
|===
|Option|Description
|Ignore empty file|Select to skip empty files. When cleared, empty files will cause the process to fail and stop.
|Do not raise an error if no files|Select to continue when no files are available to process.
|Ignore missing path|Select to continue processing files when an error occurs that (1) no fields match the JSON path or (2) that all the values are null. When cleared, no further rows are processed when an error occurs.
|Default path leaf to null|Select to return a null value for missing paths.
|Limit|Specify a limit on the number of records generated from the tra. Results are not limited when set to zero.
|Include filename in output|Select to add a string field with the filename in the result.
|Rownum in output|Select to add an integer field with the row number in the result.
|Add filenames to result|Select to add processed files to the result file list.
|===
=== Fields Tab
The Fields tab displays field definitions to extract values from the JSON structure. The table in this tab contain the following columns:
[width="90%", options="header"]
|===
|Option|Description
|Name|Name of field that maps to the corresponding field in the JSON input stream.
|Path|Complete path of the field name in the JSON input stream. All records can be retrieved by adding the asterisk * in the path. For example, $.mydata.*
|Type|Data type of the input field.
|Format|An optional mask for converting the format of the original field. See Common Formats for information on common valid date and numeric formats you can use in this transform.
|Length|Length of the field.
|Precision|Number of floating point digits for number-type fields.
|Currency|Currency symbol ($ or €, for example).
|Decimal|A decimal point can be a . (5,000.00 for example) or , (5.000,00 for example).
|Group|A grouping can be a , (10,000.00 for example) or . (5.000,00 for example).
|Trim type|The trim method to apply to a string.
|Repeat|The corresponding value from the last row repeated if a row is empty.
|Get fields|Populate the table with fields derived from the source file.
|===
=== Select fields
Click Get Fields in the Fields tab to open the Select Fields window. Select the checkbox next to each field in your source file that you want include in your output. All the fields selected in this transform are added to the table. You can search for a field name by entering the field name in the Search box.
=== Additional output fields tab
The Additional output fields tab contains the following options to specify additional information about the file to process:
[width="90%", options="header"]
|===
|Option|Description
|Short filename field|Specify the field that contains the filename without path information but with an extension.
|Extension field|Specify the field that contains the extension of the filename.
|Path field|Specify the field that contains the path in operating system format.
|Size field|Specify the field that contains the size of the data.
|Is hidden field|Specify the field indicating if the file is hidden or not (Boolean).
|Last modification field|Specify the field indicating the date of the last time the file was modified.
|Uri field|Specify the field that contains the URI.
|Root uri field|Specify the field that contains only the root part of the URI.
|===
== Metadata Injection Support
All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.