learning/prompts/documentation-lookup/02_basic_pipelines.md - beam - Git at Google

 Prompt:
 What is a Pipeline in Apache Beam?
 Response:
 In Apache Beam, a [pipeline](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) encapsulates the entirety of a data processing task, including all the data and each step of the process. Essentially, it's a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of transformations (known as `PTransform`) applied to data collections (`PCollection`).

 The simplest pipelines in Apache Beam follow a linear flow of operations, typically adhering to a read-process-write pattern. However, pipelines can also be significantly more complex, featuring multiple input sources, output sinks, and operations (known as a `PTransform`) that can both read from and output to multiple PCollections.
 For more information about pipeline design and best practices, see [Common Pipeline Patterns](https://beam.apache.org/documentation/patterns/overview/).

 To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the `main()` function).

 ```python
 import apache_beam as beam

 with beam.Pipeline() as pipeline:
   pass  # Build your pipeline here.
 ```

 When you create your pipeline, you need to set [configuration options](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options). You can set your pipeline’s configuration options programmatically. You can also set the options ahead of time, or read them from the command line, and then pass them to the `Pipeline` object when you create the object. This option is often easier than setting the pipeline configuration options programmatically.
	Prompt:
	What is a Pipeline in Apache Beam?
	Response:
	In Apache Beam, a [pipeline](https://beam.apache.org/documentation/pipelines/design-your-pipeline/) encapsulates the entirety of a data processing task, including all the data and each step of the process. Essentially, it's a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of transformations (known as `PTransform`) applied to data collections (`PCollection`).

	The simplest pipelines in Apache Beam follow a linear flow of operations, typically adhering to a read-process-write pattern. However, pipelines can also be significantly more complex, featuring multiple input sources, output sinks, and operations (known as a `PTransform`) that can both read from and output to multiple PCollections.
	For more information about pipeline design and best practices, see [Common Pipeline Patterns](https://beam.apache.org/documentation/patterns/overview/).

	To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline` (typically in the `main()` function).

	```python
	import apache_beam as beam

	with beam.Pipeline() as pipeline:
	pass # Build your pipeline here.
	```

	When you create your pipeline, you need to set [configuration options](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options). You can set your pipeline’s configuration options programmatically. You can also set the options ahead of time, or read them from the command line, and then pass them to the `Pipeline` object when you create the object. This option is often easier than setting the pipeline configuration options programmatically.