| //// |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| http://www.apache.org/licenses/LICENSE-2.0 |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| //// |
| [[MetadataInjection]] |
| :imagesdir: ../assets/images |
| :description: Metadata injection inserts data from various sources into a template pipeline at runtime to reduce repetitive tasks. |
| |
| = Metadata Injection |
| |
| Metadata injection inserts data from various sources into a template pipeline at runtime to reduce repetitive tasks. |
| |
| For example, you might have a simple pipeline to load transaction data values from a supplier, filter specific values, and output them to a file. |
| If you have more than one supplier, you would need to run this simple pipeline for each supplier. |
| Yet, with metadata injection, you can expand this simple repetitive pipeline by inserting metadata from another pipeline that contains the ETL Metadata Injection transform. |
| This transform coordinates the data values from the various inputs through the metadata you define. |
| This process reduces the need for you to adjust and run the repetitive pipeline for each specific input. |
| |
| The repetitive pipeline is known as the template pipeline. |
| The template pipeline is called by the ETL Metadata Injection transform. |
| You will create a pipeline to prepare what common values you want to use as metadata and inject these specific values through the ETL Metadata Injection transform. |
| |
| We recommend the following basic procedure for using this transform to inject metadata: |
| |
| 1. Optimize your data for injection, such as preparing folder structures and inputs. |
| |
| 2. Develop pipelines for the repetitive process (the template pipeline), for metadata injection through the ETL Metadata Injection transform, and for handling multiple inputs. |
| |
| The metadata is injected into the template pipeline through any transform that supports metadata injection. |
| |
| == Supported Transforms |
| |
| The goal is to add Metadata Injection support to all transforms, The current (29-july 2022) status is: |
| |
| |=== |
| |Transform|Supports MDI |
| |Abort|Y |
| |Add a checksum|Y |
| |Add constants|Y |
| |Add sequence|Y |
| |Add value fields changing sequence|Y |
| |Add XML|Y |
| |Analytic query|Y |
| |Apache Tika|Y |
| |Append streams|Y |
| |Avro Decode|Y |
| |Avro Encode|Y |
| |Avro File Input|Y |
| |Avro File Output|Y |
| |Azure Event Hubs Listener|Y |
| |Azure Event Hubs Writer|Y |
| |Beam BigQuery Input|Y |
| |Beam BigQuery Output|Y |
| |Beam Bigtable Input|Y |
| |Beam Bigtable Output|Y |
| |Beam GCP Pub/Sub : Publish|Y |
| |Beam GCP Pub/Sub : Subscribe|Y |
| |Beam Input|Y |
| |Beam Kafka Consume|Y |
| |Beam Kafka Produce|Y |
| |Beam Kinesis Consume|Y |
| |Beam Kinesis Produce|Y |
| |Beam Output|Y |
| |Beam Timestamp|Y |
| |Beam Window|Y |
| |Block until transforms finish|Y |
| |Blocking transform|Y |
| |Calculator|Y |
| |Call DB procedure|Y |
| |Cassandra input|Y |
| |Cassandra output|Y |
| |Change file encoding|Y |
| |Check if file is locked|Y |
| |Check if webservice is available|Y |
| |Clone row|Y |
| |Closure generator|Y |
| |Coalesce Fields|Y |
| |Column exists|Y |
| |Combination lookup/update|Y |
| |Concat Fields|Y |
| |Copy rows to result|Y |
| |Credit card validator|Y |
| |CSV file input|Y |
| |Data grid|Y |
| |Database join|Y |
| |Database lookup|Y |
| |De-serialize from file|Y |
| |Delay row|Y |
| |Delete|Y |
| |Detect empty stream|Y |
| |Dimension lookup/update|Y |
| |Doris bulk loader|Y |
| |Dummy (do nothing)|Y |
| |Dynamic SQL row|Y |
| |EDI to XML|Y |
| |Email messages input|Y |
| |Enhanced JSON Output|Y |
| |ETL metadata injection|Y |
| |Execute a process|Y |
| |Execute row SQL script|Y |
| |Execute SQL script|Y |
| |Execute Unit Tests|Y |
| |Fake data|Y |
| |File exists|Y |
| |File Metadata|Y |
| |Filter rows|Y |
| |Formula|Y |
| |Fuzzy match|Y |
| |Generate random value|Y |
| |Generate rows|Y |
| |Get data from XML|Y |
| |Get file names|Y |
| |Get files from result|Y |
| |Get files rows count|Y |
| |Get ID from hop server|Y |
| |Get Neo4j Logging Info|Y |
| |Get records from stream|Y |
| |Get rows from result|Y |
| |Get Server Status|Y |
| |Get subfolder names|Y |
| |Get system info|Y |
| |Get table names|Y |
| |Get variables|Y |
| |Group by|Y |
| |HTTP client|Y |
| |HTTP post|Y |
| |Identify last row in a stream|Y |
| |If Null|Y |
| |Injector|Y |
| |Insert / update|Y |
| |Java filter|Y |
| |JavaScript|Y |
| |Join rows (cartesian product)|Y |
| |JSON input|Y |
| |JSON output|Y |
| |Kafka Consumer|Y |
| |Kafka Producer|Y |
| |LDAP input|Y |
| |LDAP output|Y |
| |Load file content in memory|Y |
| |Mail|Y |
| |Mapping Input|Y |
| |Mapping Output|Y |
| |Memory group by|Y |
| |Merge join|Y |
| |Merge rows (diff)|Y |
| |Metadata Input|Y |
| |Metadata structure of stream|Y |
| |Microsoft Excel input|Y |
| |Microsoft Excel writer|Y |
| |MonetDB bulk loader|Y |
| |MongoDB Delete|Y |
| |MongoDB input|Y |
| |MongoDB output|Y |
| |Multiway merge join|Y |
| |Neo4j Cypher|Y |
| |Neo4j Generate CSVs|Y |
| |Neo4j Graph Output|Y |
| |Neo4j Import|Y |
| |Neo4J Output|Y |
| |Neo4j Split Graph|Y |
| |Null if|Y |
| |Number range|Y |
| |Parquet File Input|Y |
| |Parquet File Output |Y |
| |PGP decrypt stream|Y |
| |PGP encrypt stream|Y |
| |Pipeline executor|Y |
| |Pipeline Logging|Y |
| |Pipeline Probe|Y |
| |PostgreSQL Bulk Loader|Y |
| |Process files|Y |
| |Properties input|Y |
| |Properties output|Y |
| |Regex evaluation|Y |
| |Replace in string|Y |
| |Reservoir sampling|Y |
| |REST client|Y |
| |Row denormaliser|Y |
| |Row flattener|Y |
| |Row normaliser|Y |
| |Rules accumulator|Y |
| |Rules executor|Y |
| |Run SSH commands|Y |
| |Salesforce delete|Y |
| |Salesforce input|Y |
| |Salesforce insert|Y |
| |Salesforce update|Y |
| |Salesforce upsert|Y |
| |Sample rows|Y |
| |SAS Input|Y |
| |Select values|Y |
| |Serialize to file|Y |
| |Set field value|Y |
| |Set field value to a constant|Y |
| |Set files in result|Y |
| |Set variables|Y |
| |Simple Mapping|Y |
| |Snowflake Bulk Loader|Y |
| |Sort rows|Y |
| |Sorted merge|Y |
| |Split field to rows|Y |
| |Split fields|Y |
| |Splunk Input|Y |
| |SQL file output|Y |
| |SSTable output|Y |
| |Standardize phone number|Y |
| |Stream lookup|Y |
| |Stream Schema Merge|Y |
| |String operations|Y |
| |Strings cut|Y |
| |Switch / case|Y |
| |Synchronize after merge|Y |
| |Table compare|Y |
| |Table exists|Y |
| |Table input|Y |
| |Table output|Y |
| |Teradata Fastload bulk loader|Y |
| |Text file input|Y |
| |Text file input (deprecated)|Y |
| |Text file output|Y |
| |Token Replacement|Y |
| |Unique rows|Y |
| |Unique rows (HashSet)|Y |
| |Update|Y |
| |User defined Java class|Y |
| |User defined Java expression|Y |
| |Value mapper|Y |
| |Web services lookup|Y |
| |Workflow executor|Y |
| |Workflow Logging|Y |
| |Write to log|Y |
| |XML input stream (StAX)|Y |
| |XML join|Y |
| |XML output|Y |
| |XSD validator|Y |
| |XSL Transformation|Y |
| |YAML input |Y |
| |Zip file|Y |
| |=== |