{{< localstorage language language-py >}}
{{< button-pydoc path=“apache_beam.transforms.combiners” class=“Sample” >}}
Transforms for taking samples of the elements in a collection, or samples of the values associated with each key in a collection of key-value pairs.
In the following example, we create a pipeline with a PCollection
. Then, we get a random sample of elements in different ways.
We use Sample.FixedSizeGlobally()
to get a fixed-size random sample of elements from the entire PCollection
.
{{< highlight py >}} {{< code_sample “sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py” sample_fixed_size_globally >}} {{< /highlight >}}
{{< paragraph class=“notebook-skip” >}} Output: {{< /paragraph >}}
{{< highlight class=“notebook-skip” >}} {{< code_sample “sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py” sample >}} {{< /highlight >}}
{{< buttons-code-snippet py=“sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py” >}}
We use Sample.FixedSizePerKey()
to get fixed-size random samples for each unique key in a PCollection
of key-values.
{{< highlight py >}} {{< code_sample “sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py” sample_fixed_size_per_key >}} {{< /highlight >}}
{{< paragraph class=“notebook-skip” >}} Output: {{< /paragraph >}}
{{< highlight class=“notebook-skip” >}} {{< code_sample “sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample_test.py” samples_per_key >}} {{< /highlight >}}
{{< buttons-code-snippet py=“sdks/python/apache_beam/examples/snippets/transforms/aggregation/sample.py” >}}
{{< button-pydoc path=“apache_beam.transforms.combiners” class=“Sample” >}}