{% include button-pydoc.md path=“apache_beam.transforms.core” class=“FlatMap” %}
Applies a simple 1-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.
In the following examples, we create a pipeline with a PCollection
of produce with their icon, name, and duration. Then, we apply FlatMap
in multiple ways to yield zero or more elements per each input element into the resulting PCollection
.
FlatMap
accepts a function that returns an iterable
, where each of the output iterable
's elements is an element of the resulting PCollection
.
We use the function str.split
which takes a single str
element and outputs a list
of str
s. This pipeline splits the input element using whitespaces, creating a list of zero or more elements.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_simple %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
We define a function split_words
which splits an input str
element using the delimiter ','
and outputs a list
of str
s.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_function %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
For this example, we want to flatten a PCollection
of lists of str
s into a PCollection
of str
s. Each input element is already an iterable
, where each element is what we want in the resulting PCollection
. We use a lambda function that returns the same input element it received.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_lambda %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
For this example, we want to flatten a PCollection
of lists of str
s into a PCollection
of str
s. We use a generator to iterate over the input list and yield each of the elements. Each yielded result in the generator is an element in the resulting PCollection
.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_generator %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
If your PCollection
consists of (key, value)
pairs, you can use FlatMapTuple
to unpack them into different function arguments.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_tuple %}``` {:.notebook-skip} Output `PCollection` after `FlatMapTuple`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
You can pass functions with multiple arguments to FlatMap
. They are passed as additional positional arguments or keyword arguments to the function.
In this example, split_words
takes text
and delimiter
as arguments.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_multiple_arguments %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
If the PCollection
has a single value, such as the average from another computation, passing the PCollection
as a singleton accesses that value.
In this example, we pass a PCollection
the value ','
as a singleton. We then use that value as the delimiter for the str.split
method.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_side_inputs_singleton %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
If the PCollection
has multiple values, pass the PCollection
as an iterator. This accesses elements lazily as they are needed, so it is possible to iterate over large PCollection
s that won't fit into memory.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_side_inputs_iter %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:valid_plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
Note: You can pass the
PCollection
as a list withbeam.pvalue.AsList(pcollection)
, but this requires that all the elements fit into memory.
If a PCollection
is small enough to fit into memory, then that PCollection
can be passed as a dictionary. Each element must be a (key, value)
pair. Note that all the elements of the PCollection
must fit into memory for this. If the PCollection
won't fit into memory, use beam.pvalue.AsIter(pcollection)
instead.
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py tag:flatmap_side_inputs_dict %}``` {:.notebook-skip} Output `PCollection` after `FlatMap`: {:.notebook-skip}
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap_test.py tag:valid_plants %}```
{% include buttons-code-snippet.md py=“sdks/python/apache_beam/examples/snippets/transforms/elementwise/flatmap.py” notebook=“examples/notebooks/documentation/transforms/python/elementwise/flatmap” %}
{% include button-pydoc.md path=“apache_beam.transforms.core” class=“FlatMap” %}