Here you can find a collection of the interactive notebooks available for Apache Beam, which are hosted in Colab. The notebooks allow you to interactively play with the code and see how your changes affect the pipeline. You don't need to install anything or modify your computer in any way to use these notebooks.
You can also try an Apache Beam pipeline using the Java, Python, and Go SDKs.
In this notebook we go through the basics of what is Apache Beam and how to get started. We learn what is a data pipeline, a PCollection, a PTransform, as well as some basic transforms like Map
, FlatMap
, Filter
, Combine
, and GroupByKey
.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/getting-started.ipynb” >}}
In this notebook we go through some examples on how to read and write data to and from different data formats. We introduce the built-in ReadFromText
and WriteToText
transforms. We also see how we can read from CSV files, read from a SQLite database, write fixed-sized batches of elements, and write windows of elements.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb” >}}
In this notebook we go through how to aggregate data based on time intervals, or in streaming pipelines. We introduce the GlobalWindow
, FixedWindows
, SlidingWindows
, and Sessions
.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb” >}}
Beam DataFrames provide a pandas-like DataFrame API to declare Beam pipelines. To learn more about Beam DataFrames, take a look at the Beam DataFrames overview page.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/dataframes.ipynb” >}}
Check the Python transform catalog for a complete list of the available transforms.
Applies a simple one-to-one mapping function over each element in the collection.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/map-py.ipynb” >}}
Applies a simple one-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/flatmap-py.ipynb” >}}
Given a predicate, filter out all elements that don’t satisfy that predicate.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/filter-py.ipynb” >}}
Separates elements in a collection into multiple output collections.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/partition-py.ipynb” >}}
A transform for generic parallel processing. It's recommended to use Map
, FlatMap
, Filter
or other more specific transforms when possible.
{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/pardo-py.ipynb” >}}