docs/hop-user-manual/modules/ROOT/pages/pipeline/pipelines.adoc - hop - Git at Google

 ////
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 ////
 [[Pipelines]]
 :imagesdir: ../assets/images
 :description: Pipelines, together with workflows, are the main building blocks in Hop. Pipelines perform the heavy data lifting: in a pipeline, you read data from one or more sources, perform a number of operations (joins, lookups, filters and lots more) and finally write the processed data to one or more target platforms.

 = Pipelines

 == Pipelines overview

 Pipelines, together with workflows, are the main building blocks in Hop. Pipelines perform the heavy data lifting: in a pipeline, you read data from one or more sources, perform a number of operations (joins, lookups, filters and lots more) and finally write the processed data to one or more target platforms.

 Pipelines are a network of xref:pipeline/transforms.adoc[transforms], connected by hops. Just like the xref:workflow/actions.adoc[actions] in a workflow, each transform is a small piece of functionality. The combination of a number of transforms allow Hop developers to build powerful data processing and, in combination with workflows, orchestration solutions.

 Even though there is some visual resemblance, workflows and pipelines operate very differently.

 The core principles of pipelines are:

 * pipelines are networks. Each transform in a pipeline is part of the network.
 * a pipeline runs all of its transforms in parallel. All transforms are started and process data simultaneously. In a simple pipeline where you read data from a large file, do some processing and finally write to a database table, you're typically still reading from the file while you're already loading data to the database.
 * data flows through the various transforms in a pipeline over hops. In contrast to workflow hops, pipeline hops typically don't have an exit status. Pipelines do have some routing capabilities through e.g. xref:pipeline/transforms/filterrows.adoc[Filter Rows] transform and xref:pipeline/errorhandling.adoc[error handling], but the core pipeline principle still applies: the pipeline is a network, and data flow through the network in parallel.

 == Example pipeline walk-through

 The example below shows a very basic pipeline. This is what happens when we run this pipeline:

 * the pipeline has 7 transforms. All 7 of these transforms become active when we start the pipeline.
 * the "read-25M-records" transform starts reading data from a file, and pushes that data down the stream to "perform-calculations" and the following transforms. Since reading 25 million records takes a while, some data may already have finished processing while we're still reading records from the file.
 * the "lookup-sql-data" matches data we read from the file with data we retrieved from the "read-sql-data" transform. The xref:pipeline/transforms/streamlookup.adoc[Stream Lookup] accepts input from the "read-sql-data", which is shown with the information icon image:icons/info.svg[] on the hop.
 * once the data from the file and sql query are matched, we check a condition with the xref:pipeline/transforms/filterrows.adoc[Filter Rows] transform in "condition?". The output of this data is passed to "write-to-table" or "write-to-file", depending on whether the condition outcome was true or false.

 image:hop-gui/pipeline/basic-pipeline.png[Pipelines - basic pipeline, width="65%"]

 == Next steps

 Pipelines are an extensive topic. Check the pages below to learn more about working with pipelines:

 * xref:pipeline/hop-pipeline-editor.adoc[Pipeline Editor]
 * xref:pipeline/create-pipeline.adoc[Create a Pipeline]
 * xref:pipeline/run-preview-debug-pipeline.adoc[Run, Preview and Debug a Pipeline]
 * xref:pipeline/pipeline-run-configurations/pipeline-run-configurations.adoc[Pipeline Run Configurations]
 * xref:pipeline/metadata-injection.adoc[Metadata Injection]
 * xref:pipeline/partitioning.adoc[Partitioning]
 * xref:pipeline/beam/getting-started-with-beam.adoc[Getting started with Apache Beam]
 * xref:pipeline/transforms.adoc[Transforms]
	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at
	http://www.apache.org/licenses/LICENSE-2.0
	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	////
	[[Pipelines]]
	:imagesdir: ../assets/images
	:description: Pipelines, together with workflows, are the main building blocks in Hop. Pipelines perform the heavy data lifting: in a pipeline, you read data from one or more sources, perform a number of operations (joins, lookups, filters and lots more) and finally write the processed data to one or more target platforms.

	= Pipelines

	== Pipelines overview

	Pipelines, together with workflows, are the main building blocks in Hop. Pipelines perform the heavy data lifting: in a pipeline, you read data from one or more sources, perform a number of operations (joins, lookups, filters and lots more) and finally write the processed data to one or more target platforms.

	Pipelines are a network of xref:pipeline/transforms.adoc[transforms], connected by hops. Just like the xref:workflow/actions.adoc[actions] in a workflow, each transform is a small piece of functionality. The combination of a number of transforms allow Hop developers to build powerful data processing and, in combination with workflows, orchestration solutions.

	Even though there is some visual resemblance, workflows and pipelines operate very differently.

	The core principles of pipelines are:

	* pipelines are networks. Each transform in a pipeline is part of the network.
	* a pipeline runs all of its transforms in parallel. All transforms are started and process data simultaneously. In a simple pipeline where you read data from a large file, do some processing and finally write to a database table, you're typically still reading from the file while you're already loading data to the database.
	* data flows through the various transforms in a pipeline over hops. In contrast to workflow hops, pipeline hops typically don't have an exit status. Pipelines do have some routing capabilities through e.g. xref:pipeline/transforms/filterrows.adoc[Filter Rows] transform and xref:pipeline/errorhandling.adoc[error handling], but the core pipeline principle still applies: the pipeline is a network, and data flow through the network in parallel.

	== Example pipeline walk-through

	The example below shows a very basic pipeline. This is what happens when we run this pipeline:

	* the pipeline has 7 transforms. All 7 of these transforms become active when we start the pipeline.
	* the "read-25M-records" transform starts reading data from a file, and pushes that data down the stream to "perform-calculations" and the following transforms. Since reading 25 million records takes a while, some data may already have finished processing while we're still reading records from the file.
	* the "lookup-sql-data" matches data we read from the file with data we retrieved from the "read-sql-data" transform. The xref:pipeline/transforms/streamlookup.adoc[Stream Lookup] accepts input from the "read-sql-data", which is shown with the information icon image:icons/info.svg[] on the hop.
	* once the data from the file and sql query are matched, we check a condition with the xref:pipeline/transforms/filterrows.adoc[Filter Rows] transform in "condition?". The output of this data is passed to "write-to-table" or "write-to-file", depending on whether the condition outcome was true or false.

	image:hop-gui/pipeline/basic-pipeline.png[Pipelines - basic pipeline, width="65%"]

	== Next steps

	Pipelines are an extensive topic. Check the pages below to learn more about working with pipelines:

	* xref:pipeline/hop-pipeline-editor.adoc[Pipeline Editor]
	* xref:pipeline/create-pipeline.adoc[Create a Pipeline]
	* xref:pipeline/run-preview-debug-pipeline.adoc[Run, Preview and Debug a Pipeline]
	* xref:pipeline/pipeline-run-configurations/pipeline-run-configurations.adoc[Pipeline Run Configurations]
	* xref:pipeline/metadata-injection.adoc[Metadata Injection]
	* xref:pipeline/partitioning.adoc[Partitioning]
	* xref:pipeline/beam/getting-started-with-beam.adoc[Getting started with Apache Beam]
	* xref:pipeline/transforms.adoc[Transforms]