title: “Tour of Beam”

Tour of Beam

Here you can find a collection of the interactive notebooks available for Apache Beam, which are hosted in Colab. The notebooks allow you to interactively play with the code and see how your changes affect the pipeline. You don't need to install anything or modify your computer in any way to use these notebooks.

You can also try an Apache Beam pipeline using the Java, Python, and Go SDKs.

Get started

Learn the basics

In this notebook we go through the basics of what is Apache Beam and how to get started. We learn what is a data pipeline, a PCollection, a PTransform, as well as some basic transforms like Map, FlatMap, Filter, Combine, and GroupByKey.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/getting-started.ipynb” >}}

Reading and writing data

In this notebook we go through some examples on how to read and write data to and from different data formats. We introduce the built-in ReadFromText and WriteToText transforms. We also see how we can read from CSV files, read from a SQLite database, write fixed-sized batches of elements, and write windows of elements.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/reading-and-writing-data.ipynb” >}}

Windowing

In this notebook we go through how to aggregate data based on time intervals, or in streaming pipelines. We introduce the GlobalWindow, FixedWindows, SlidingWindows, and Sessions.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb” >}}

DataFrames

Beam DataFrames provide a pandas-like DataFrame API to declare Beam pipelines. To learn more about Beam DataFrames, take a look at the Beam DataFrames overview page.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/dataframes.ipynb” >}}

Transforms

Check the Python transform catalog for a complete list of the available transforms.

Element-wise transforms

Map

Applies a simple one-to-one mapping function over each element in the collection.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/map-py.ipynb” >}}

FlatMap

Applies a simple one-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/flatmap-py.ipynb” >}}

Filter

Given a predicate, filter out all elements that don’t satisfy that predicate.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/filter-py.ipynb” >}}

Partition

Separates elements in a collection into multiple output collections.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/partition-py.ipynb” >}}

ParDo

A transform for generic parallel processing. It's recommended to use Map, FlatMap, Filter or other more specific transforms when possible.

{{< button-colab url=“https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/pardo-py.ipynb” >}}