blob: 7f648a30e4de3d3d9cfedbb1d9ad2c4737913ff5 [file] [log] [blame]
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Apache Beam – Use Beam</title><link>/get-started/</link><description>Recent content in Use Beam on Apache Beam</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/get-started/index.xml" rel="self" type="application/rss+xml"/><item><title>Get-Started: An Interactive Overview of Beam</title><link>/get-started/an-interactive-overview-of-beam/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/an-interactive-overview-of-beam/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="an-interactive-overview-of-beam">An Interactive Overview of Beam&lt;/h1>
&lt;p>Here you can find a collection of the interactive notebooks available for Apache Beam, which are hosted in
&lt;a href="https://colab.research.google.com">Colab&lt;/a>.
The notebooks allow you to interactively play with the code and see how your changes affect the pipeline.
You don&amp;rsquo;t need to install anything or modify your computer in any way to use these notebooks.&lt;/p>
&lt;p>You can also &lt;a href="/get-started/try-apache-beam">try an Apache Beam pipeline&lt;/a> using the Java, Python, and Go SDKs.&lt;/p>
&lt;h2 id="get-started">Get started&lt;/h2>
&lt;h3 id="learn-the-basics">Learn the basics&lt;/h3>
&lt;p>In this notebook we go through the basics of what is Apache Beam and how to get started.
We learn what is a data pipeline, a PCollection, a PTransform, as well as some basic transforms like &lt;code>Map&lt;/code>, &lt;code>FlatMap&lt;/code>, &lt;code>Filter&lt;/code>, &lt;code>Combine&lt;/code>, and &lt;code>GroupByKey&lt;/code>.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/interactive-overview/getting-started.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h3 id="reading-and-writing-data">Reading and writing data&lt;/h3>
&lt;p>In this notebook we go through some examples on how to read and write data to and from different data formats.
We introduce the built-in &lt;code>ReadFromText&lt;/code> and &lt;code>WriteToText&lt;/code> transforms.
We also see how we can read from CSV files, read from a SQLite database, write fixed-sized batches of elements, and write windows of elements.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/interactive-overview/reading-and-writing-data.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h3 id="windowing">Windowing&lt;/h3>
&lt;p>In this notebook we go through how to aggregate data based on time intervals, or in streaming pipelines.
We introduce the &lt;code>GlobalWindow&lt;/code>, &lt;code>FixedWindows&lt;/code>, &lt;code>SlidingWindows&lt;/code>, and &lt;code>Sessions&lt;/code>.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/interactive-overview/windowing.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h3 id="dataframes">DataFrames&lt;/h3>
&lt;p>Beam DataFrames provide a pandas-like &lt;a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html">DataFrame&lt;/a>
API to declare Beam pipelines.
To learn more about Beam DataFrames, take a look at the
&lt;a href="/documentation/dsls/dataframes/overview">Beam DataFrames overview&lt;/a> page.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/interactive-overview/dataframes.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h2 id="transforms">Transforms&lt;/h2>
&lt;p>Check the &lt;a href="/documentation/transforms/python/overview/">Python transform catalog&lt;/a>
for a complete list of the available transforms.&lt;/p>
&lt;h3 id="element-wise-transforms">Element-wise transforms&lt;/h3>
&lt;h4 id="map">Map&lt;/h4>
&lt;p>Applies a simple one-to-one mapping function over each element in the collection.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/map-py.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h4 id="flatmap">FlatMap&lt;/h4>
&lt;p>Applies a simple one-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/flatmap-py.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h4 id="filter">Filter&lt;/h4>
&lt;p>Given a predicate, filter out all elements that don’t satisfy that predicate.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/filter-py.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h4 id="partition">Partition&lt;/h4>
&lt;p>Separates elements in a collection into multiple output collections.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/partition-py.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h4 id="pardo">ParDo&lt;/h4>
&lt;p>A transform for generic parallel processing. It&amp;rsquo;s recommended to use &lt;code>Map&lt;/code>, &lt;code>FlatMap&lt;/code>, &lt;code>Filter&lt;/code> or other more specific transforms when possible.&lt;/p>
&lt;table align="left">
&lt;td>
&lt;a class="button" target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/pardo-py.ipynb">
&lt;img alt="Run in Colab" width="32px" height="32px"
src="https://github.com/googlecolab/open_in_colab/raw/master/images/icon32.png" />
Run in Colab
&lt;/a>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Get-Started: Beam Mobile Gaming Example</title><link>/get-started/mobile-gaming-example/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/mobile-gaming-example/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-mobile-gaming-pipeline-examples">Apache Beam Mobile Gaming Pipeline Examples&lt;/h1>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#userscore-basic-score-processing-in-batch">UserScore: Basic Score Processing in Batch&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#what-does-userscore-do">What Does UserScore Do?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#limitations">Limitations&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#hourlyteamscore-advanced-processing-in-batch-with-windowing">HourlyTeamScore: Advanced Processing in Batch with Windowing&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#what-does-hourlyteamscore-do">What Does HourlyTeamScore Do?&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fixed-time-windowing">Fixed-Time Windowing&lt;/a>&lt;/li>
&lt;li>&lt;a href="#filtering-based-on-event-time">Filtering Based On Event Time&lt;/a>&lt;/li>
&lt;li>&lt;a href="#calculating-score-per-team-per-window">Calculating Score Per Team, Per Window&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#limitations-1">Limitations&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#leaderboard-streaming-processing-with-real-time-game-data">LeaderBoard: Streaming Processing with Real-Time Game Data&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#what-does-leaderboard-do">What Does LeaderBoard Do?&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#calculating-user-score-based-on-processing-time">Calculating User Score based on Processing Time&lt;/a>&lt;/li>
&lt;li>&lt;a href="#calculating-team-score-based-on-event-time">Calculating Team Score based on Event Time&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#gamestats-abuse-detection-and-usage-analysis">GameStats: Abuse Detection and Usage Analysis&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#what-does-gamestats-do">What Does GameStats Do?&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#abuse-detection">Abuse Detection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#analyzing-usage-patterns">Analyzing Usage Patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>This section provides a walkthrough of a series of example Apache Beam pipelines that demonstrate more complex functionality than the basic &lt;a href="/get-started/wordcount-example">WordCount&lt;/a> examples. The pipelines in this section process data from a hypothetical game that users play on their mobile phones. The pipelines demonstrate processing at increasing levels of complexity; the first pipeline, for example, shows how to run a batch analysis job to obtain relatively simple score data, while the later pipelines use Beam&amp;rsquo;s windowing and triggers features to provide low-latency data analysis and more complex intelligence about user&amp;rsquo;s play patterns.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note&lt;/strong>: These examples assume some familiarity with the Beam programming model. If you haven&amp;rsquo;t already, we recommend familiarizing yourself with the programming model documentation and running a basic example pipeline before continuing. Note also that these examples use the Java 8 lambda syntax, and thus require Java 8. However, you can create pipelines with equivalent functionality using Java 7.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-py">&lt;blockquote>
&lt;p>&lt;strong>Note&lt;/strong>: These examples assume some familiarity with the Beam programming model. If you haven&amp;rsquo;t already, we recommend familiarizing yourself with the programming model documentation and running a basic example pipeline before continuing.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note&lt;/strong>: MobileGaming is not yet available for the Go SDK. There is an open issue for this
(&lt;a href="https://github.com/apache/beam/issues/18806">Issue 18806&lt;/a>).&lt;/p>
&lt;/blockquote>
&lt;p>Every time a user plays an instance of our hypothetical mobile game, they generate a data event. Each data event consists of the following information:&lt;/p>
&lt;ul>
&lt;li>The unique ID of the user playing the game.&lt;/li>
&lt;li>The team ID for the team to which the user belongs.&lt;/li>
&lt;li>A score value for that particular instance of play.&lt;/li>
&lt;li>A timestamp that records when the particular instance of play happened&amp;ndash;this is the event time for each game data event.&lt;/li>
&lt;/ul>
&lt;p>When the user completes an instance of the game, their phone sends the data event to a game server, where the data is logged and stored in a file. Generally the data is sent to the game server immediately upon completion. However, sometimes delays can happen in the network at various points. Another possible scenario involves users who play the game &amp;ldquo;offline&amp;rdquo;, when their phones are out of contact with the server (such as on an airplane, or outside network coverage area). When the user&amp;rsquo;s phone comes back into contact with the game server, the phone will send all accumulated game data. In these cases, some data events may arrive delayed and out of order.&lt;/p>
&lt;p>The following diagram shows the ideal situation (events are processed as they occur) vs. reality (there is often a time delay before processing).&lt;/p>
&lt;p>&lt;img src="/images/gaming-example-basic.png" alt="There is often a time delay before processing events.">&lt;/p>
&lt;p>&lt;em>Figure 1: The X-axis represents event time: the actual time a game event
occurred. The Y-axis represents processing time: the time at which a game event
was processed. Ideally, events should be processed as they occur, depicted by
the dotted line in the diagram. However, in reality that is not the case and it
looks more like what is depicted by the red squiggly line above the ideal line.&lt;/em>&lt;/p>
&lt;p>The data events might be received by the game server significantly later than users generate them. This time difference (called &lt;strong>skew&lt;/strong>) can have processing implications for pipelines that make calculations that consider when each score was generated. Such pipelines might track scores generated during each hour of a day, for example, or they calculate the length of time that users are continuously playing the game—both of which depend on each data record&amp;rsquo;s event time.&lt;/p>
&lt;p>Because some of our example pipelines use data files (like logs from the game server) as input, the event timestamp for each game might be embedded in the data&amp;ndash;that is, it&amp;rsquo;s a field in each data record. Those pipelines need to parse the event timestamp from each data record after reading it from the input file.&lt;/p>
&lt;p>For pipelines that read unbounded game data from an unbounded source, the data source sets the intrinsic &lt;a href="/documentation/programming-guide/#element-timestamps">timestamp&lt;/a> for each PCollection element to the appropriate event time.&lt;/p>
&lt;p>The Mobile Gaming example pipelines vary in complexity, from simple batch analysis to more complex pipelines that can perform real-time analysis and abuse detection. This section walks you through each example and demonstrates how to use Beam features like windowing and triggers to expand your pipeline&amp;rsquo;s capabilites.&lt;/p>
&lt;h2 id="userscore-basic-score-processing-in-batch">UserScore: Basic Score Processing in Batch&lt;/h2>
&lt;p>The &lt;code>UserScore&lt;/code> pipeline is the simplest example for processing mobile game data. &lt;code>UserScore&lt;/code> determines the total score per user over a finite data set (for example, one day&amp;rsquo;s worth of scores stored on the game server). Pipelines like &lt;code>UserScore&lt;/code> are best run periodically after all relevant data has been gathered. For example, &lt;code>UserScore&lt;/code> could run as a nightly job over data gathered during that day.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/game/UserScore.java">UserScore on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-py">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/user_score.py">UserScore on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;h3 id="what-does-userscore-do">What Does UserScore Do?&lt;/h3>
&lt;p>In a day&amp;rsquo;s worth of scoring data, each user ID may have multiple records (if the user plays more than one instance of the game during the analysis window), each with their own score value and timestamp. If we want to determine the total score over all the instances a user plays during the day, our pipeline will need to group all the records together per individual user.&lt;/p>
&lt;p>As the pipeline processes each event, the event score gets added to the sum total for that particular user.&lt;/p>
&lt;p>&lt;code>UserScore&lt;/code> parses out only the data that it needs from each record, specifically the user ID and the score value. The pipeline doesn&amp;rsquo;t consider the event time for any record; it simply processes all data present in the input files that you specify when you run the pipeline.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> To use the &lt;code>UserScore&lt;/code> pipeline effectively, you&amp;rsquo;d need to ensure that you supply input data that has already been grouped by the desired event time period — that is, that you specify an input file that only contains data from the day you care about.&lt;/p>
&lt;/blockquote>
&lt;p>&lt;code>UserScore&lt;/code>&amp;rsquo;s basic pipeline flow does the following:&lt;/p>
&lt;ol>
&lt;li>Read the day&amp;rsquo;s score data from a text file.&lt;/li>
&lt;li>Sum the score values for each unique user by grouping each game event by user ID and combining the score values to get the total score for that particular user.&lt;/li>
&lt;li>Write the result data to a text file.&lt;/li>
&lt;/ol>
&lt;p>The following diagram shows score data for several users over the pipeline analysis period. In the diagram, each data point is an event that results in one user/score pair.&lt;/p>
&lt;img src="/images/gaming-example.gif" alt="A pipeline processes score data for three users." width="850px">
&lt;p>&lt;em>Figure 2: Score data for three users.&lt;/em>&lt;/p>
&lt;p>This example uses batch processing, and the diagram&amp;rsquo;s Y axis represents processing time: the pipeline processes events lower on the Y-axis first, and events higher up the axis later. The diagram&amp;rsquo;s X axis represents the event time for each game event, as denoted by that event&amp;rsquo;s timestamp. Note that the individual events in the diagram are not processed by the pipeline in the same order as they occurred (according to their timestamps).&lt;/p>
&lt;p>After reading the score events from the input file, the pipeline groups all of those user/score pairs together and sums the score values into one total value per unique user. &lt;code>UserScore&lt;/code> encapsulates the core logic for that step as the &lt;a href="/documentation/programming-guide/#composite-transforms">user-defined composite transform&lt;/a> &lt;code>ExtractAndSumScore&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ExtractAndSumScore&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">field&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">field&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">field&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">field&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">gameInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">gameInfo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">kvs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">integers&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">field&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getScore&lt;/span>&lt;span class="o">())))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">integersPerKey&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ExtractAndSumScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;A transform to extract key/score information and sum the scores.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> The constructor argument `field` determines whether &amp;#39;team&amp;#39; or &amp;#39;user&amp;#39; info is
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> extracted.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">field&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO(BEAM-6158): Revert the workaround once we can pickle super() on py3.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># super().__init__()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">field&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">field&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">field&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;score&amp;#39;&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;code>ExtractAndSumScore&lt;/code> is written to be more general, in that you can pass in the field by which you want to group the data (in the case of our game, by unique user or unique team). This means we can re-use &lt;code>ExtractAndSumScore&lt;/code> in other pipelines that group score data by team, for example.&lt;/p>
&lt;p>Here&amp;rsquo;s the main method of &lt;code>UserScore&lt;/code>, showing how we apply all three steps of the pipeline:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">throws&lt;/span> &lt;span class="n">Exception&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Begin constructing a pipeline configured by commandline flags.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Options&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Read events from a text file and parse them.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInput&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ParseGameEvent&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ParseEventFn&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract and sum username/score pairs from the event data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractUserScore&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;user&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WriteUserScoreSums&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">WriteToText&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOutput&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">configureOutput&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Run the batch pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">save_main_session&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;Main entry point; defines and runs the user_score pipeline.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">argparse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ArgumentParser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The default maps to two large Google Cloud Storage files (each ~12GB)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># holding two subsequent day&amp;#39;s worth (roughly) of data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--input&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://apache-beam-samples/game/small/gaming_data.csv&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Path to the data file(s) containing game data.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--output&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">required&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Path to the output file(s).&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pipeline_args&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parse_known_args&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pipeline_args&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># We use the save_main_session option because one or more DoFn&amp;#39;s in this&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># workflow rely on global context (e.g., a module imported at module level).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view_as&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">SetupOptions&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">save_main_session&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">save_main_session&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">format_user_score_sums&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user_score&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="n">user&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">score&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">user_score&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="s1">&amp;#39;user: &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">, total_score: &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">user&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">score&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span> &lt;span class="c1"># pylint: disable=expression-not-assigned&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ReadInputText&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;UserScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">UserScore&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FormatUserScoreSums&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">format_user_score_sums&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WriteUserScoreSums&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">output&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="limitations">Limitations&lt;/h3>
&lt;p>As written in the example, the &lt;code>UserScore&lt;/code> pipeline has a few limitations:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Because some score data may be generated by offline players and sent after the daily cutoff, for game data, the result data generated by the &lt;code>UserScore&lt;/code> pipeline &lt;strong>may be incomplete&lt;/strong>. &lt;code>UserScore&lt;/code> only processes the fixed input set present in the input file(s) when the pipeline runs.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>UserScore&lt;/code> processes all data events present in the input file at processing time, and &lt;strong>does not examine or otherwise error-check events based on event time&lt;/strong>. Therefore, the results may include some values whose event times fall outside the relevant analysis period, such as late records from the previous day.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Because &lt;code>UserScore&lt;/code> runs only after all the data has been collected, it has &lt;strong>high latency&lt;/strong> between when users generate data events (the event time) and when results are computed (the processing time).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>UserScore&lt;/code> also only reports the total results for the entire day, and doesn&amp;rsquo;t provide any finer-grained information about how the data accumulated during the day.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Starting with the next pipeline example, we&amp;rsquo;ll discuss how you can use Beam&amp;rsquo;s features to address these limitations.&lt;/p>
&lt;h2 id="hourlyteamscore-advanced-processing-in-batch-with-windowing">HourlyTeamScore: Advanced Processing in Batch with Windowing&lt;/h2>
&lt;p>The &lt;code>HourlyTeamScore&lt;/code> pipeline expands on the basic batch analysis principles used in the &lt;code>UserScore&lt;/code> pipeline and improves upon some of its limitations. &lt;code>HourlyTeamScore&lt;/code> performs finer-grained analysis, both by using additional features in the Beam SDKs, and taking into account more aspects of the game data. For example, &lt;code>HourlyTeamScore&lt;/code> can filter out data that isn&amp;rsquo;t part of the relevant analysis period.&lt;/p>
&lt;p>Like &lt;code>UserScore&lt;/code>, &lt;code>HourlyTeamScore&lt;/code> is best thought of as a job to be run periodically after all the relevant data has been gathered (such as once per day). The pipeline reads a fixed data set from a file, and writes the results &lt;span class="language-java">back to a text file&lt;/span>&lt;span class="language-py">to a Google Cloud BigQuery table&lt;/span>.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java">HourlyTeamScore on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-py">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py">HourlyTeamScore on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;h3 id="what-does-hourlyteamscore-do">What Does HourlyTeamScore Do?&lt;/h3>
&lt;p>&lt;code>HourlyTeamScore&lt;/code> calculates the total score per team, per hour, in a fixed data set (such as one day&amp;rsquo;s worth of data).&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Rather than operating on the entire data set at once, &lt;code>HourlyTeamScore&lt;/code> divides the input data into logical windows and performs calculations on those windows. This allows &lt;code>HourlyUserScore&lt;/code> to provide information on scoring data per window, where each window represents the game score progress at fixed intervals in time (like once every hour).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>HourlyTeamScore&lt;/code> filters data events based on whether their event time (as indicated by the embedded timestamp) falls within the relevant analysis period. Basically, the pipeline checks each game event&amp;rsquo;s timestamp and ensures that it falls within the range we want to analyze (in this case the day in question). Data events from previous days are discarded and not included in the score totals. This makes &lt;code>HourlyTeamScore&lt;/code> more robust and less prone to erroneous result data than &lt;code>UserScore&lt;/code>. It also allows the pipeline to account for late-arriving data that has a timestamp within the relevant analysis period.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Below, we&amp;rsquo;ll look at each of these enhancements in &lt;code>HourlyTeamScore&lt;/code> in detail:&lt;/p>
&lt;h4 id="fixed-time-windowing">Fixed-Time Windowing&lt;/h4>
&lt;p>Using fixed-time windowing lets the pipeline provide better information on how events accumulated in the data set over the course of the analysis period. In our case, it tells us when in the day each team was active and how much the team scored at those times.&lt;/p>
&lt;p>The following diagram shows how the pipeline processes a day&amp;rsquo;s worth of a single team&amp;rsquo;s scoring data after applying fixed-time windowing:&lt;/p>
&lt;img src="/images/gaming-example-team-scores-narrow.gif" alt="A pipeline processes score data for two teams." width="800px">
&lt;p>&lt;em>Figure 3: Score data for two teams. Each team&amp;rsquo;s scores are divided into
logical windows based on when those scores occurred in event time.&lt;/em>&lt;/p>
&lt;p>Notice that as processing time advances, the sums are now &lt;em>per window&lt;/em>; each window represents an hour of &lt;em>event time&lt;/em> during the day in which the scores occurred.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> As is shown in the diagram above, using windowing produces an &lt;em>independent total for every interval&lt;/em> (in this case, each hour). &lt;code>HourlyTeamScore&lt;/code> doesn&amp;rsquo;t provide a running total for the entire data set at each hour&amp;ndash;it provides the total score for all the events that occurred &lt;em>only within that hour&lt;/em>.&lt;/p>
&lt;/blockquote>
&lt;p>Beam&amp;rsquo;s windowing feature uses the &lt;a href="/documentation/programming-guide/#element-timestamps">intrinsic timestamp information&lt;/a> attached to each element of a &lt;code>PCollection&lt;/code>. Because we want our pipeline to window based on &lt;em>event time&lt;/em>, we &lt;strong>must first extract the timestamp&lt;/strong> that&amp;rsquo;s embedded in each data record apply it to the corresponding element in the &lt;code>PCollection&lt;/code> of score data. Then, the pipeline can &lt;strong>apply the windowing function&lt;/strong> to divide the &lt;code>PCollection&lt;/code> into logical windows.&lt;/p>
&lt;p class="language-java">&lt;code>HourlyTeamScore&lt;/code> uses the &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java">WithTimestamps&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java">Window&lt;/a> transforms to perform these operations.&lt;/p>
&lt;p class="language-py">&lt;code>HourlyTeamScore&lt;/code> uses the &lt;code>FixedWindows&lt;/code> transform, found in &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/window.py">window.py&lt;/a>, to perform these operations.&lt;/p>
&lt;p>The following code shows this:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Add an element timestamp based on the event log, and apply fixed windowing.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;AddEventTimestamps&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">WithTimestamps&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">())))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FixedWindowsTeam&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getWindowDuration&lt;/span>&lt;span class="o">()))))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Add an element timestamp based on the event log, and apply fixed&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># windowing.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;AddEventTimestamps&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampedValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">elem&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FixedWindowsTeam&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window_duration_in_seconds&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Notice that the transforms the pipeline uses to specify the windowing are distinct from the actual data processing transforms (such as &lt;code>ExtractAndSumScores&lt;/code>). This functionality provides you some flexibility in designing your Beam pipeline, in that you can run existing transforms over datasets with different windowing characteristics.&lt;/p>
&lt;h4 id="filtering-based-on-event-time">Filtering Based On Event Time&lt;/h4>
&lt;p>&lt;code>HourlyTeamScore&lt;/code> uses &lt;strong>filtering&lt;/strong> to remove any events from our dataset whose timestamps don&amp;rsquo;t fall within the relevant analysis period (i.e. they weren&amp;rsquo;t generated during the day that we&amp;rsquo;re interested in). This keeps the pipeline from erroneously including any data that was, for example, generated offline during the previous day but sent to the game server during the current day.&lt;/p>
&lt;p>It also lets the pipeline include relevant &lt;strong>late data&lt;/strong>—data events with valid timestamps, but that arrived after our analysis period ended. If our pipeline cutoff time is 12:00 am, for example, we might run the pipeline at 2:00 am, but filter out any events whose timestamps indicate that they occurred after the 12:00 am cutoff. Data events that were delayed and arrived between 12:01 am and 2:00 am, but whose timestamps indicate that they occurred before the 12:00 am cutoff, would be included in the pipeline processing.&lt;/p>
&lt;p>&lt;code>HourlyTeamScore&lt;/code> uses the &lt;code>Filter&lt;/code> transform to perform this operation. When you apply &lt;code>Filter&lt;/code>, you specify a predicate to which each data record is compared. Data records that pass the comparison are included, while events that fail the comparison are excluded. In our case, the predicate is the cut-off time we specify, and we compare just one part of the data—the timestamp field.&lt;/p>
&lt;p>The following code shows how &lt;code>HourlyTeamScore&lt;/code> uses the &lt;code>Filter&lt;/code> transform to filter events that occur either before or after the relevant analysis period:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FilterStartTime&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">startMinTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FilterEndTime&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">stopMinTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FilterStartTime&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start_timestamp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FilterEndTime&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stop_timestamp&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="calculating-score-per-team-per-window">Calculating Score Per Team, Per Window&lt;/h4>
&lt;p>&lt;code>HourlyTeamScore&lt;/code> uses the same &lt;code>ExtractAndSumScores&lt;/code> transform as the &lt;code>UserScore&lt;/code> pipeline, but passes a different key (team, as opposed to user). Also, because the pipeline applies &lt;code>ExtractAndSumScores&lt;/code> &lt;em>after&lt;/em> applying fixed-time 1-hour windowing to the input data, the data gets grouped by both team &lt;em>and&lt;/em> window. You can see the full sequence of transforms in &lt;code>HourlyTeamScore&lt;/code>&amp;rsquo;s main method:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">throws&lt;/span> &lt;span class="n">Exception&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Begin constructing a pipeline configured by commandline flags.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Options&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">final&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">stopMinTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">minFmt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parseMillis&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getStopMin&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">final&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">startMinTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">minFmt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parseMillis&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getStartMin&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Read &amp;#39;gaming&amp;#39; events from a text file.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInput&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Parse the incoming data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ParseGameEvent&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ParseEventFn&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Filter out data before and after the given times so that it is not included
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// in the calculations. As we collect data in batches (say, by day), the batch for the day
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// that we want to analyze could potentially include some late-arriving data from the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// previous day.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// If so, we want to weed it out. Similarly, if we include data from the following day
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// (to scoop up late-arriving events from the day we&amp;#39;re analyzing), we need to weed out
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// events that fall after the time period we want to analyze.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// [START DocInclude_HTSFilters]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FilterStartTime&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">startMinTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FilterEndTime&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">gInfo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">stopMinTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// [END DocInclude_HTSFilters]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// [START DocInclude_HTSAddTsAndWindow]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Add an element timestamp based on the event log, and apply fixed windowing.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;AddEventTimestamps&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">WithTimestamps&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">GameActionInfo&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTimestamp&lt;/span>&lt;span class="o">())))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FixedWindowsTeam&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getWindowDuration&lt;/span>&lt;span class="o">()))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// [END DocInclude_HTSAddTsAndWindow]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract and sum teamname/score pairs from the event data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractTeamScore&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;team&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WriteTeamScoreSums&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">WriteToText&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOutput&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">configureOutput&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">HourlyTeamScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">start_min&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">stop_min&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">window_duration&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO(BEAM-6158): Revert the workaround once we can pickle super() on py3.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># super().__init__()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">str2timestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">start_min&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stop_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">str2timestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">stop_min&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window_duration_in_seconds&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">window_duration&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ParseGameEventFn&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ParseGameEventFn&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Filter out data before and after the given times so that it is not&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># included in the calculations. As we collect data in batches (say, by&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># day), the batch for the day that we want to analyze could potentially&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># include some late-arriving data from the previous day. If so, we want&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># to weed it out. Similarly, if we include data from the following day&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># (to scoop up late-arriving events from the day we&amp;#39;re analyzing), we&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># need to weed out events that fall after the time period we want to&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># analyze.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># [START filter_by_time_range]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FilterStartTime&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start_timestamp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FilterEndTime&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stop_timestamp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># [END filter_by_time_range]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># [START add_timestamp_and_window]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Add an element timestamp based on the event log, and apply fixed&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># windowing.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;AddEventTimestamps&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampedValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">elem&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FixedWindowsTeam&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window_duration_in_seconds&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># [END add_timestamp_and_window]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract and sum teamname/score pairs from the event data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractAndSumScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;team&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">save_main_session&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;Main entry point; defines and runs the hourly_team_score pipeline.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">argparse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ArgumentParser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The default maps to two large Google Cloud Storage files (each ~12GB)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># holding two subsequent day&amp;#39;s worth (roughly) of data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--input&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://apache-beam-samples/game/gaming_data*.csv&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Path to the data file(s) containing game data.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--dataset&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">required&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;BigQuery Dataset to write tables to. &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;Must already exist.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--table_name&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;leader_board&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The BigQuery table name. Should not already exist.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--window_duration&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Numeric value of fixed window duration, in minutes&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--start_min&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;1970-01-01-00-00&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;String representation of the first minute after &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;which to generate results in the format: &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;yyyy-MM-dd-HH-mm. Any input data timestamped &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;prior to that minute won&lt;/span>&lt;span class="se">\&amp;#39;&lt;/span>&lt;span class="s1">t be included in the &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;sums.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--stop_min&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;2100-01-01-00-00&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;String representation of the first minute for &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;which to generate results in the format: &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;yyyy-MM-dd-HH-mm. Any input data timestamped &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;after to that minute won&lt;/span>&lt;span class="se">\&amp;#39;&lt;/span>&lt;span class="s1">t be included in the &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;sums.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pipeline_args&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parse_known_args&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pipeline_args&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># We also require the --project option to access --dataset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view_as&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">GoogleCloudOptions&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">project&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">print_usage&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sys&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s1">&amp;#39;: error: argument --project is required&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sys&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">exit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># We use the save_main_session option because one or more DoFn&amp;#39;s in this&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># workflow rely on global context (e.g., a module imported at module level).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view_as&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">SetupOptions&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">save_main_session&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">save_main_session&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span> &lt;span class="c1"># pylint: disable=expression-not-assigned&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ReadInputText&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;HourlyTeamScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">HourlyTeamScore&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start_min&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stop_min&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window_duration&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;TeamScoresDict&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TeamScoresDict&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WriteTeamScoreSums&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">WriteToBigQuery&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">table_name&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dataset&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;team&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;STRING&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;total_score&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;INTEGER&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;window_start&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;STRING&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view_as&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">GoogleCloudOptions&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">project&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="limitations-1">Limitations&lt;/h3>
&lt;p>As written, &lt;code>HourlyTeamScore&lt;/code> still has a limitation:&lt;/p>
&lt;ul>
&lt;li>&lt;code>HourlyTeamScore&lt;/code> still has &lt;strong>high latency&lt;/strong> between when data events occur (the event time) and when results are generated (the processing time), because, as a batch pipeline, it needs to wait to begin processing until all data events are present.&lt;/li>
&lt;/ul>
&lt;h2 id="leaderboard-streaming-processing-with-real-time-game-data">LeaderBoard: Streaming Processing with Real-Time Game Data&lt;/h2>
&lt;p>One way we can help address the latency issue present in the &lt;code>UserScore&lt;/code> and &lt;code>HourlyTeamScore&lt;/code> pipelines is by reading the score data from an unbounded source. The &lt;code>LeaderBoard&lt;/code> pipeline introduces streaming processing by reading the game score data from an unbounded source that produces an infinite amount of data, rather than from a file on the game server.&lt;/p>
&lt;p>The &lt;code>LeaderBoard&lt;/code> pipeline also demonstrates how to process game score data with respect to both &lt;em>processing time&lt;/em> and &lt;em>event time&lt;/em>. &lt;code>LeaderBoard&lt;/code> outputs data about both individual user scores and about team scores, each with respect to a different time frame.&lt;/p>
&lt;p>Because the &lt;code>LeaderBoard&lt;/code> pipeline reads the game data from an unbounded source as that data is generated, you can think of the pipeline as an ongoing job running concurrently with the game process. &lt;code>LeaderBoard&lt;/code> can thus provide low-latency insights into how users are playing the game at any given moment — useful if, for example, we want to provide a live web-based scoreboard so that users can track their progress against other users as they play.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java">LeaderBoard on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-py">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/leader_board.py">LeaderBoard on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;h3 id="what-does-leaderboard-do">What Does LeaderBoard Do?&lt;/h3>
&lt;p>The &lt;code>LeaderBoard&lt;/code> pipeline reads game data published to an unbounded source that produces an infinite amount of data in near real-time, and uses that data to perform two separate processing tasks:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>LeaderBoard&lt;/code> calculates the total score for every unique user and publishes speculative results for every ten minutes of &lt;em>processing time&lt;/em>. That is, ten minutes after data is received, the pipeline outputs the total score per user that the pipeline has processed to date. This calculation provides a running &amp;ldquo;leader board&amp;rdquo; in close to real time, regardless of when the actual game events were generated.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>LeaderBoard&lt;/code> calculates the team scores for each hour that the pipeline runs. This is useful if we want to, for example, reward the top-scoring team for each hour of play. The team score calculation uses fixed-time windowing to divide the input data into hour-long finite windows based on the &lt;em>event time&lt;/em> (indicated by the timestamp) as data arrives in the pipeline.&lt;/p>
&lt;p>In addition, the team score calculation uses Beam&amp;rsquo;s trigger mechanisms to provide speculative results for each hour (which update every five minutes until the hour is up), and to also capture any late data and add it to the specific hour-long window to which it belongs.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Below, we&amp;rsquo;ll look at both of these tasks in detail.&lt;/p>
&lt;h4 id="calculating-user-score-based-on-processing-time">Calculating User Score based on Processing Time&lt;/h4>
&lt;p>We want our pipeline to output a running total score for each user for every ten minutes of processing time. This calculation doesn&amp;rsquo;t consider &lt;em>when&lt;/em> the actual score was generated by the user&amp;rsquo;s play instance; it simply outputs the sum of all the scores for that user that have arrived in the pipeline to date. Late data gets included in the calculation whenever it happens to arrive in the pipeline as it&amp;rsquo;s running.&lt;/p>
&lt;p>Because we want all the data that has arrived in the pipeline every time we update our calculation, we have the pipeline consider all of the user score data in a &lt;strong>single global window&lt;/strong>. The single global window is unbounded, but we can specify a kind of temporary cut-off point for each ten-minute calculation by using a processing time &lt;a href="/documentation/programming-guide/#triggers">trigger&lt;/a>.&lt;/p>
&lt;p>When we specify a ten-minute processing time trigger for the single global window, the pipeline effectively takes a &amp;ldquo;snapshot&amp;rdquo; of the contents of the window every time the trigger fires. This snapshot happens after ten minutes have passed since data was received. If no data has arrived, the pipeline takes its next &amp;ldquo;snapshot&amp;rdquo; 10 minutes after an element arrives. Since we&amp;rsquo;re using a single global window, each snapshot contains all the data collected &lt;em>to that point in time&lt;/em>. The following diagram shows the effects of using a processing time trigger on the single global window:&lt;/p>
&lt;img src="/images/gaming-example-proc-time-narrow.gif" alt="A pipeline processes score data for three users." width="850px">
&lt;p>&lt;em>Figure 4: Score data for three users. Each user&amp;rsquo;s scores are grouped together
in a single global window, with a trigger that generates a snapshot for output
ten minutes after data is received.&lt;/em>&lt;/p>
&lt;p>As processing time advances and more scores are processed, the trigger outputs the updated sum for each user.&lt;/p>
&lt;p>The following code example shows how &lt;code>LeaderBoard&lt;/code> sets the processing time trigger to output the data for user scores:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Extract user/score pairs from the event stream using processing time, via global windowing. Get
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * periodic updates on all users&amp;#39; running scores.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@VisibleForTesting&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">CalculateUserScores&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CalculateUserScores&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">allowedLateness&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">input&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;LeaderboardUserGlobalWindow&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">GlobalWindows&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get periodic results every ten minutes.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">triggering&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Repeatedly&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">forever&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TEN_MINUTES&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">accumulatingFiredPanes&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withAllowedLateness&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract and sum username/score pairs from the event data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractUserScore&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;user&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">CalculateUserScores&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;Extract user/score pairs from the event stream using processing time, via
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> global windowing. Get periodic updates on all users&amp;#39; running scores.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">allowed_lateness&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO(BEAM-6158): Revert the workaround once we can pickle super() on py3.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># super().__init__()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">allowed_lateness_seconds&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">allowed_lateness&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># NOTE: the behavior does not exactly match the Java example&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO: allowed_lateness not implemented yet in FixedWindows&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO: AfterProcessingTime not implemented yet, replace AfterCount&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Get periodic results every ten events.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;LeaderboardUserGlobalWindows&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GlobalWindows&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Repeatedly&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AccumulationMode&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ACCUMULATING&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract and sum username/score pairs from the event data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractAndSumScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;user&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;code>LeaderBoard&lt;/code> sets the &lt;a href="/documentation/programming-guide/#window-accumulation-modes">window accumulation mode&lt;/a> to accumulate window panes as the trigger fires. This accumulation mode is set by &lt;span class="language-java">invoking &lt;code>.accumulatingFiredPanes&lt;/code>&lt;/span> &lt;span class="language-py">using &lt;code>accumulation_mode=trigger.AccumulationMode.ACCUMULATING&lt;/code>&lt;/span> when setting the trigger, and causes the pipeline to accumulate the previously emitted data together with any new data that&amp;rsquo;s arrived since the last trigger fire. This ensures that &lt;code>LeaderBoard&lt;/code> is a running sum for the user scores, rather than a collection of individual sums.&lt;/p>
&lt;h4 id="calculating-team-score-based-on-event-time">Calculating Team Score based on Event Time&lt;/h4>
&lt;p>We want our pipeline to also output the total score for each team during each hour of play. Unlike the user score calculation, for team scores, we care about when in &lt;em>event&lt;/em> time each score actually occurred, because we want to consider each hour of play individually. We also want to provide speculative updates as each individual hour progresses, and to allow any instances of late data — data that arrives after a given hour&amp;rsquo;s data is considered complete — to be included in our calculation.&lt;/p>
&lt;p>Because we consider each hour individually, we can apply fixed-time windowing to our input data, just like in &lt;code>HourlyTeamScore&lt;/code>. To provide the speculative updates and updates on late data, we&amp;rsquo;ll specify additional trigger parameters. The trigger will cause each window to calculate and emit results at an interval we specify (in this case, every five minutes), and also to keep triggering after the window is considered &amp;ldquo;complete&amp;rdquo; to account for late data. Just like the user score calculation, we&amp;rsquo;ll set the trigger to accumulating mode to ensure that we get a running sum for each hour-long window.&lt;/p>
&lt;p>The triggers for speculative updates and late data help with the problem of &lt;a href="/documentation/programming-guide/#windowing">time skew&lt;/a>. Events in the pipeline aren&amp;rsquo;t necessarily processed in the order in which they actually occurred according to their timestamps; they may arrive in the pipeline out of order, or late (in our case, because they were generated while the user&amp;rsquo;s phone was out of contact with a network). Beam needs a way to determine when it can reasonably assume that it has &amp;ldquo;all&amp;rdquo; of the data in a given window: this is called the &lt;em>watermark&lt;/em>.&lt;/p>
&lt;p>In an ideal world, all data would be processed immediately when it occurs, so the processing time would be equal to (or at least have a linear relationship to) the event time. However, because distributed systems contain some inherent inaccuracy (like our late-reporting phones), Beam often uses a heuristic watermark.&lt;/p>
&lt;p>The following diagram shows the relationship between ongoing processing time and each score&amp;rsquo;s event time for two teams:&lt;/p>
&lt;img src="/images/gaming-example-event-time-narrow.gif" alt="A pipeline processes score data by team, windowed by event time." width="800px">
&lt;p>&lt;em>Figure 5: Score data by team, windowed by event time. A trigger based on
processing time causes the window to emit speculative early results and include
late results.&lt;/em>&lt;/p>
&lt;p>The dotted line in the diagram is the &amp;ldquo;ideal&amp;rdquo; &lt;strong>watermark&lt;/strong>: Beam&amp;rsquo;s notion of when all data in a given window can reasonably be considered to have arrived. The irregular solid line represents the actual watermark, as determined by the data source.&lt;/p>
&lt;p>Data arriving above the solid watermark line is &lt;em>late data&lt;/em> — this is a score event that was delayed (perhaps generated offline) and arrived after the window to which it belongs had closed. Our pipeline&amp;rsquo;s late-firing trigger ensures that this late data is still included in the sum.&lt;/p>
&lt;p>The following code example shows how &lt;code>LeaderBoard&lt;/code> applies fixed-time windowing with the appropriate triggers to have our pipeline perform the calculations we want:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Extract team/score pairs from the event stream, using hour-long windows by default.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nd">@VisibleForTesting&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">CalculateTeamScores&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">teamWindowDuration&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CalculateTeamScores&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span> &lt;span class="n">teamWindowDuration&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">teamWindowDuration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">teamWindowDuration&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">allowedLateness&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">infos&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">infos&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;LeaderboardTeamFixedWindows&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">teamWindowDuration&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We will get early (speculative) results as well as cumulative
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// processing of late data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">triggering&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterWatermark&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastEndOfWindow&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withEarlyFirings&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FIVE_MINUTES&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLateFirings&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TEN_MINUTES&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withAllowedLateness&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">allowedLateness&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">accumulatingFiredPanes&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract and sum teamname/score pairs from the event data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractTeamScore&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;team&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">CalculateTeamScores&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;Calculates scores for each team within the configured window duration.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Extract team/score pairs from the event stream, using hour-long windows by
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> default.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">team_window_duration&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">allowed_lateness&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO(BEAM-6158): Revert the workaround once we can pickle super() on py3.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># super().__init__()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">team_window_duration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">team_window_duration&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">allowed_lateness_seconds&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">allowed_lateness&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># NOTE: the behavior does not exactly match the Java example&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO: allowed_lateness not implemented yet in FixedWindows&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TODO: AfterProcessingTime not implemented yet, replace AfterCount&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># We will get early (speculative) results as well as cumulative&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># processing of late data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;LeaderboardTeamFixedWindows&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">team_window_duration&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AfterWatermark&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">trigger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AccumulationMode&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ACCUMULATING&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract and sum teamname/score pairs from the event data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractAndSumScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;team&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Taken together, these processing strategies let us address the latency and completeness issues present in the &lt;code>UserScore&lt;/code> and &lt;code>HourlyTeamScore&lt;/code> pipelines, while still using the same basic transforms to process the data—as a matter of fact, both calculations still use the same &lt;code>ExtractAndSumScore&lt;/code> transform that we used in both the &lt;code>UserScore&lt;/code> and &lt;code>HourlyTeamScore&lt;/code> pipelines.&lt;/p>
&lt;h2 id="gamestats-abuse-detection-and-usage-analysis">GameStats: Abuse Detection and Usage Analysis&lt;/h2>
&lt;p>While &lt;code>LeaderBoard&lt;/code> demonstrates how to use basic windowing and triggers to perform low-latency and flexible data analysis, we can use more advanced windowing techniques to perform more comprehensive analysis. This might include some calculations designed to detect system abuse (like spam) or to gain insight into user behavior. The &lt;code>GameStats&lt;/code> pipeline builds on the low-latency functionality in &lt;code>LeaderBoard&lt;/code> to demonstrate how you can use Beam to perform this kind of advanced analysis.&lt;/p>
&lt;p>Like &lt;code>LeaderBoard&lt;/code>, &lt;code>GameStats&lt;/code> reads data from an unbounded source. It is best thought of as an ongoing job that provides insight into the game as users play.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java">GameStats on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-py">&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> See &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/game_stats.py">GameStats on GitHub&lt;/a> for the complete example pipeline program.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;h3 id="what-does-gamestats-do">What Does GameStats Do?&lt;/h3>
&lt;p>Like &lt;code>LeaderBoard&lt;/code>, &lt;code>GameStats&lt;/code> calculates the total score per team, per hour. However, the pipeline also performs two kinds of more complex analysis:&lt;/p>
&lt;ul>
&lt;li>&lt;code>GameStats&lt;/code> does &lt;strong>abuse detection&lt;/strong> system that performs some simple statistical analysis on the score data to determine which users, if any, might be spammers or bots. It then uses the list of suspected spam/bot users to filter the bots out of the hourly team score calculation.&lt;/li>
&lt;li>&lt;code>GameStats&lt;/code> &lt;strong>analyzes usage patterns&lt;/strong> by grouping together game data that share similar event times using session windowing. This lets us gain some intelligence on how long users tend to play, and how game length changes over time.&lt;/li>
&lt;/ul>
&lt;p>Below, we&amp;rsquo;ll look at these features in more detail.&lt;/p>
&lt;h4 id="abuse-detection">Abuse Detection&lt;/h4>
&lt;p>Let&amp;rsquo;s suppose scoring in our game depends on the speed at which a user can &amp;ldquo;click&amp;rdquo; on their phone. &lt;code>GameStats&lt;/code>&amp;rsquo;s abuse detection analyzes each user&amp;rsquo;s score data to detect if a user has an abnormally high &amp;ldquo;click rate&amp;rdquo; and thus an abnormally high score. This might indicate that the game is being played by a bot that operates significantly faster than a human could play.&lt;/p>
&lt;p>To determine whether or not a score is &amp;ldquo;abnormally&amp;rdquo; high, &lt;code>GameStats&lt;/code> calculates the average of every score in that fixed-time window, and then checks each individual score against the average score multiplied by an arbitrary weight factor (in our case, 2.5). Thus, any score more than 2.5 times the average is deemed to be the product of spam. The &lt;code>GameStats&lt;/code> pipeline tracks a list of &amp;ldquo;spam&amp;rdquo; users and filters those users out of the team score calculations for the team leader board.&lt;/p>
&lt;p>Since the average depends on the pipeline data, we need to calculate it, and then use that calculated data in a subsequent &lt;code>ParDo&lt;/code> transform that filters scores that exceed the weighted value. To do this, we can pass the calculated average to as a &lt;a href="/documentation/programming-guide/#side-inputs">side input&lt;/a> to the filtering &lt;code>ParDo&lt;/code>.&lt;/p>
&lt;p>The following code example shows the composite transform that handles abuse detection. The transform uses the &lt;code>Sum.integersPerKey&lt;/code> transform to sum all scores per user, and then the &lt;code>Mean.globally&lt;/code> transform to determine the average score for all users. Once that&amp;rsquo;s been calculated (as a &lt;code>PCollectionView&lt;/code> singleton), we can pass it to the filtering &lt;code>ParDo&lt;/code> using &lt;code>.withSideInputs&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">CalculateSpammyUsers&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Logger&lt;/span> &lt;span class="n">LOG&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">LoggerFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getLogger&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CalculateSpammyUsers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">SCORE_WEIGHT&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">5&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">userScores&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get the sum of scores for each user.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">sumScores&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">userScores&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;UserSum&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">integersPerKey&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract the score from each element, and use it to find the global mean.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">PCollectionView&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">globalMeanScore&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sumScores&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Values&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Mean&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">globally&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">asSingletonView&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Filter the user sums using the global mean.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">filtered&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sumScores&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;ProcessAndFilter&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// use the derived mean total score as a side input
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Counter&lt;/span> &lt;span class="n">numSpammerUsers&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">counter&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;main&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;SpammerUsers&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Integer&lt;/span> &lt;span class="n">score&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Double&lt;/span> &lt;span class="n">gmc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sideInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">globalMeanScore&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">score&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">gmc&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">SCORE_WEIGHT&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">LOG&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">info&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;user &amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34; spammer score &amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="n">score&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34; with mean &amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="n">gmc&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">numSpammerUsers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">inc&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withSideInputs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">globalMeanScore&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">filtered&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">CalculateSpammyUsers&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;Filter out all but those users with a high clickrate, which we will
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> consider as &amp;#39;spammy&amp;#39; uesrs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> We do this by finding the mean total score per user, then using that
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> information as a side input to filter out all but those user scores that are
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> larger than (mean * SCORE_WEIGHT).
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SCORE_WEIGHT&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">2.5&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">user_scores&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Get the sum of scores for each user.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sum_scores&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">user_scores&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;SumUsersScores&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract the score from each element, and use it to find the global mean.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">global_mean_score&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sum_scores&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Values&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MeanCombineFn&lt;/span>&lt;span class="p">())&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">as_singleton_view&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Filter the user sums using the global mean.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">filtered&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sum_scores&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Use the derived mean total score (global_mean_score) as a side input.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ProcessAndFilter&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">lambda&lt;/span> &lt;span class="n">key_score&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">global_mean&lt;/span>&lt;span class="p">:&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key_score&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">global_mean&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SCORE_WEIGHT&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">global_mean_score&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">filtered&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The abuse-detection transform generates a view of users suspected to be spambots. Later in the pipeline, we use that view to filter out any such users when we calculate the team score per hour, again by using the side input mechanism. The following code example shows where we insert the spam filter, between windowing the scores into fixed windows and extracting the team scores:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Calculate the total score per team over fixed windows,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and emit cumulative updates for late data. Uses the side input derived above-- the set of
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// suspected robots-- to filter out scores from those users from the sum.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Write the results to BigQuery.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">rawEvents&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WindowIntoFixedWindows&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFixedWindowDuration&lt;/span>&lt;span class="o">()))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Filter out the detected spammer users, using the side input derived above.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;FilterOutSpammers&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">GameActionInfo&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// If the user is not in the spammers Map, output the data element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sideInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">spammersView&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getUser&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">trim&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="kc">null&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withSideInputs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">spammersView&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract and sum teamname/score pairs from the event data.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractTeamScore&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;team&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Calculate the total score per team over fixed windows, and emit cumulative&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># updates for late data. Uses the side input derived above --the set of&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># suspected robots-- to filter out scores from those users from the sum.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Write the results to BigQuery.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">(&lt;/span> &lt;span class="c1"># pylint: disable=expression-not-assigned&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">raw_events&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WindowIntoFixedWindows&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">fixed_window_duration&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Filter out the detected spammer users, using the side input derived&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># above&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FilterOutSpammers&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">spammers&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;user&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">spammers&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">spammers_view&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract and sum teamname/score pairs from the event data.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractAndSumScore&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ExtractAndSumScore&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;team&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="analyzing-usage-patterns">Analyzing Usage Patterns&lt;/h4>
&lt;p>We can gain some insight on when users are playing our game, and for how long, by examining the event times for each game score and grouping scores with similar event times into &lt;em>sessions&lt;/em>. &lt;code>GameStats&lt;/code> uses Beam&amp;rsquo;s built-in &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Sessions.java">session windowing&lt;/a> function to group user scores into sessions based on the time they occurred.&lt;/p>
&lt;p>When you set session windowing, you specify a &lt;em>minimum gap duration&lt;/em> between events. All events whose arrival times are closer together than the minimum gap duration are grouped into the same window. Events where the difference in arrival time is greater than the gap are grouped into separate windows. Depending on how we set our minimum gap duration, we can safely assume that scores in the same session window are part of the same (relatively) uninterrupted stretch of play. Scores in a different window indicate that the user stopped playing the game for at least the minimum gap time before returning to it later.&lt;/p>
&lt;p>The following diagram shows how data might look when grouped into session windows. Unlike fixed windows, session windows are &lt;em>different for each user&lt;/em> and is dependent on each individual user&amp;rsquo;s play pattern:&lt;/p>
&lt;p>&lt;img src="/images/gaming-example-session-windows.png" alt="User sessions with a minimum gap duration.">&lt;/p>
&lt;p>&lt;em>Figure 6: User sessions with a minimum gap duration. Each user has different
sessions, according to how many instances they play and how long their breaks
between instances are.&lt;/em>&lt;/p>
&lt;p>We can use the session-windowed data to determine the average length of uninterrupted play time for all of our users, as well as the total score they achieve during each session. We can do this in the code by first applying session windows, summing the score per user and session, and then using a transform to calculate the length of each individual session:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Detect user sessions-- that is, a burst of activity separated by a gap from further
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// activity. Find and record the mean session lengths.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This information could help the game designers track the changing user engagement
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// as their set of games changes.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">userEvents&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WindowIntoSessions&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Sessions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">withGapDuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getSessionGap&lt;/span>&lt;span class="o">())))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withTimestampCombiner&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimestampCombiner&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">END_OF_WINDOW&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// For this use, we care only about the existence of the session, not any particular
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// information aggregated over it, so the following is an efficient way to do that.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">perKey&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get the duration per session.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;UserSessionActivity&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">UserSessionInfoFn&lt;/span>&lt;span class="o">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Detect user sessions-- that is, a burst of activity separated by a gap&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># from further activity. Find and record the mean session lengths.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># This information could help the game designers track the changing user&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># engagement as their set of game changes.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">(&lt;/span> &lt;span class="c1"># pylint: disable=expression-not-assigned&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">user_events&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WindowIntoSessions&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sessions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">session_gap&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timestamp_combiner&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampCombiner&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">OUTPUT_AT_EOW&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># For this use, we care only about the existence of the session, not any&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># particular information aggregated over it, so we can just group by key&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># and assign a &amp;#34;dummy value&amp;#34; of None.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">_&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Get the duration of the session&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;UserSessionActivity&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">UserSessionActivity&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>This gives us a set of user sessions, each with an attached duration. We can then calculate the &lt;em>average&lt;/em> session length by re-windowing the data into fixed time windows, and then calculating the average for all sessions that end in each hour:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Re-window to process groups of session sums according to when the sessions complete.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WindowToExtractSessionMean&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getUserActivityWindowDuration&lt;/span>&lt;span class="o">()))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Find the mean session duration in each window.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Mean&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">globally&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">withoutDefaults&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Write this info to a BigQuery table.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;WriteAvgSessionLength&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">WriteWindowedToBigQuery&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">GcpOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">getProject&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getDataset&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getGameStatsTablePrefix&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34;_sessions&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">configureSessionWindowWrite&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Re-window to process groups of session sums according to when the&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># sessions complete&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WindowToExtractSessionMean&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user_activity_window_duration&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Find the mean session duration in each window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MeanCombineFn&lt;/span>&lt;span class="p">())&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">without_defaults&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FormatAvgSessionLength&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">elem&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;mean_duration&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">elem&lt;/span>&lt;span class="p">)})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WriteAvgSessionLength&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">WriteToBigQuery&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">table_name&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s1">&amp;#39;_sessions&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dataset&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mean_duration&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;FLOAT&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view_as&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">GoogleCloudOptions&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">project&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>We can use the resulting information to find, for example, what times of day our users are playing the longest, or which stretches of the day are more likely to see shorter play sessions.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Beam Overview</title><link>/get-started/beam-overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/beam-overview/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-overview">Apache Beam Overview&lt;/h1>
&lt;p>Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam&amp;rsquo;s supported &lt;strong>distributed processing back-ends&lt;/strong>, which include &lt;a href="https://flink.apache.org">Apache Flink&lt;/a>, &lt;a href="https://spark.apache.org">Apache Spark&lt;/a>, and &lt;a href="https://cloud.google.com/dataflow">Google Cloud Dataflow&lt;/a>.&lt;/p>
&lt;p>Beam is particularly useful for &lt;a href="https://en.wikipedia.org/wiki/Embarassingly_parallel">embarrassingly parallel&lt;/a> data processing tasks, in which the problem can be decomposed into many smaller bundles of data that can be processed independently and in parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data integration. These tasks are useful for moving data between different storage media and data sources, transforming data into a more desirable format, or loading data onto a new system.&lt;/p>
&lt;div style="display: flex; justify-content: center">
&lt;img src="/images/learner_graph.png" width="800px" alt="Learner Graph">
&lt;/div>
&lt;h2 id="apache-beam-sdks">Apache Beam SDKs&lt;/h2>
&lt;p>The Beam SDKs provide a unified programming model that can represent and transform data sets of any size, whether the input is a finite data set from a batch data source, or an infinite data set from a streaming data source. The Beam SDKs use the same classes to represent both bounded and unbounded data, and the same transforms to operate on that data. You use the Beam SDK of your choice to build a program that defines your data processing pipeline.&lt;/p>
&lt;p>Beam currently supports the following language-specific SDKs:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/sdks/java">Apache Beam Java SDK&lt;/a> &lt;img src="/images/logos/sdks/java.png" alt="Java logo">&lt;/li>
&lt;li>&lt;a href="/documentation/sdks/python">Apache Beam Python SDK&lt;/a> &lt;img src="/images/logos/sdks/python.png" alt="Python logo">&lt;/li>
&lt;li>&lt;a href="/documentation/sdks/go">Apache Beam Go SDK&lt;/a> &lt;img src="/images/logos/sdks/go.png" height="45px" alt="Go logo">&lt;/li>
&lt;/ul>
&lt;p>A Scala &lt;img src="/images/logos/sdks/scala.png" height="45px" alt="Scala logo"> interface is also available as &lt;a href="https://github.com/spotify/scio">Scio&lt;/a>.&lt;/p>
&lt;h2 id="apache-beam-pipeline-runners">Apache Beam Pipeline Runners&lt;/h2>
&lt;p>The Beam Pipeline Runners translate the data processing pipeline you define with your Beam program into the API compatible with the distributed processing back-end of your choice. When you run your Beam program, you&amp;rsquo;ll need to specify an &lt;a href="/documentation/runners/capability-matrix">appropriate runner&lt;/a> for the back-end where you want to execute your pipeline.&lt;/p>
&lt;p>Beam currently supports the following runners:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/direct">Direct Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/flink">Apache Flink Runner&lt;/a> &lt;img src="/images/logos/runners/flink.png" height="50px" alt="Apache Flink logo">&lt;/li>
&lt;li>&lt;a href="/documentation/runners/nemo">Apache Nemo Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/samza">Apache Samza Runner&lt;/a> &lt;img src="/images/logos/runners/samza.png" height="40px" alt="Apache Samza logo">&lt;/li>
&lt;li>&lt;a href="/documentation/runners/spark">Apache Spark Runner&lt;/a> &lt;img src="/images/logos/runners/spark.png" height="50px" alt="Apache Spark logo">&lt;/li>
&lt;li>&lt;a href="/documentation/runners/dataflow">Google Cloud Dataflow Runner&lt;/a> &lt;img src="/images/logos/runners/dataflow.png" height="50px" alt="Google Cloud Dataflow logo">&lt;/li>
&lt;li>&lt;a href="/documentation/runners/jet">Hazelcast Jet Runner&lt;/a> &lt;img src="/images/logos/runners/jet.png" height="40px" alt="Hazelcast Jet logo">&lt;/li>
&lt;li>&lt;a href="/documentation/runners/twister2">Twister2 Runner&lt;/a> &lt;img src="/images/logos/runners/twister2.png" height="50px" alt="Twister2 logo">&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note:&lt;/strong> You can always execute your pipeline locally for testing and debugging purposes.&lt;/p>
&lt;h2 id="get-started">Get Started&lt;/h2>
&lt;p>Get started using Beam for your data processing tasks.&lt;/p>
&lt;blockquote>
&lt;p>If you already know &lt;a href="https://spark.apache.org/">Apache Spark&lt;/a>,
check our &lt;a href="/get-started/from-spark">Getting started from Apache Spark&lt;/a> page.&lt;/p>
&lt;/blockquote>
&lt;ol>
&lt;li>
&lt;p>Take the &lt;a href="https://tour.beam.apache.org/">Tour of Beam&lt;/a> as an online interactive learning experience.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Follow the Quickstart for the &lt;a href="/get-started/quickstart-java">Java SDK&lt;/a>, the &lt;a href="/get-started/quickstart-py">Python SDK&lt;/a>, or the &lt;a href="/get-started/quickstart-go">Go SDK&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>See the &lt;a href="/get-started/wordcount-example">WordCount Examples Walkthrough&lt;/a> for examples that introduce various features of the SDKs.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Dive into the &lt;a href="/documentation/">Documentation&lt;/a> section for in-depth concepts and reference materials for the Beam model, SDKs, and runners.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Dive into the &lt;a href="https://github.com/GoogleCloudPlatform/dataflow-cookbook">cookbook examples&lt;/a> for learning how to run Beam on Dataflow.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="contribute">Contribute&lt;/h2>
&lt;p>Beam is an &lt;a href="https://www.apache.org" target="_blank">Apache Software Foundation&lt;/a> project, available under the Apache v2 license. Beam is an open source community and contributions are greatly appreciated! If you&amp;rsquo;d like to contribute, please see the &lt;a href="/contribute/">Contribute&lt;/a> section.&lt;/p></description></item><item><title>Get-Started: Beam Quickstart for Go</title><link>/get-started/quickstart-go/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart-go/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="wordcount-quickstart-for-go">WordCount quickstart for Go&lt;/h1>
&lt;p>This Quickstart will walk you through executing your first Beam pipeline to run &lt;a href="/get-started/wordcount-example">WordCount&lt;/a>, written using Beam&amp;rsquo;s &lt;a href="/documentation/sdks/go">Go SDK&lt;/a>, on a &lt;a href="/documentation#runners">runner&lt;/a> of your choice.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Go codebase, see the &lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-environment">Set up your environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-wordcount">Run wordcount&lt;/a>&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-environment">Set up your environment&lt;/h2>
&lt;p>The Beam SDK for Go requires &lt;code>go&lt;/code> version 1.20 or newer. It can be downloaded &lt;a href="https://golang.org/">here&lt;/a>. Check what go version you have by running:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>go version&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>If you are unfamiliar with Go, see the &lt;a href="https://go.dev/doc/tutorial/getting-started">Get Started With Go Tutorial&lt;/a>.&lt;/p>
&lt;h2 id="run-wordcount">Run wordcount&lt;/h2>
&lt;p>The Apache Beam
&lt;a href="https://github.com/apache/beam/tree/master/sdks/go/examples">examples&lt;/a>
directory has many examples. All examples can be run by passing the
required arguments described in the examples.&lt;/p>
&lt;p>For example, to run &lt;code>wordcount&lt;/code>, run:&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">go run github.com/apache/beam/sdks/v2/go/examples/wordcount@latest --input &amp;#34;gs://apache-beam-samples/shakespeare/kinglear.txt&amp;#34; --output counts
less counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">go run github.com/apache/beam/sdks/v2/go/examples/wordcount@latest --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://&amp;lt;your-gcs-bucket&amp;gt;/counts \
--runner dataflow \
--project your-gcp-project \
--region your-gcp-region \
--temp_location gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp/ \
--staging_location gs://&amp;lt;your-gcs-bucket&amp;gt;/binaries/&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark"># Build and run the Spark job server from Beam source.
# -PsparkMasterUrl is optional. If it is unset the job will be run inside an embedded Spark cluster.
./gradlew :runners:spark:3:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077
# In a separate terminal, run:
go run github.com/apache/beam/sdks/v2/go/examples/wordcount@latest --input &amp;lt;PATH_TO_INPUT_FILE&amp;gt; \
--output counts \
--runner spark \
--endpoint localhost:8099&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/go/">Beam SDK for Go&lt;/a>
and look through the &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">godoc&lt;/a>.&lt;/li>
&lt;li>Walk through these WordCount examples in the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Clone the &lt;a href="https://github.com/apache/beam-starter-go">Beam Go starter project&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Beam Quickstart for Go</title><link>/get-started/quickstart/go/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart/go/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-go-sdk-quickstart">Apache Beam Go SDK quickstart&lt;/h1>
&lt;p>This quickstart shows you how to run an
&lt;a href="https://github.com/apache/beam-starter-go">example pipeline&lt;/a> written with the
&lt;a href="/documentation/sdks/go">Apache Beam Go SDK&lt;/a>, using the
&lt;a href="/documentation/runners/direct/">Direct Runner&lt;/a>. The Direct Runner executes
pipelines locally on your machine.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Go codebase, see the
&lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;p>On this page:&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-development-environment">Set up your development environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#clone-the-github-repository">Clone the GitHub repository&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-quickstart">Run the quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#explore-the-code">Explore the code&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#create-a-pipeline">Create a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#create-an-initial-pcollection">Create an initial PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apply-transforms-to-the-pcollection">Apply transforms to the PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-pipeline">Run the pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-development-environment">Set up your development environment&lt;/h2>
&lt;p>Make sure you have a &lt;a href="https://go.dev/">Go&lt;/a> development environment ready. If
not, follow the instructions in the
&lt;a href="https://go.dev/doc/install">Download and install&lt;/a> page.&lt;/p>
&lt;h2 id="clone-the-github-repository">Clone the GitHub repository&lt;/h2>
&lt;p>Clone or download the
&lt;a href="https://github.com/apache/beam-starter-go">apache/beam-starter-go&lt;/a> GitHub
repository and change into the &lt;code>beam-starter-go&lt;/code> directory.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>git clone https://github.com/apache/beam-starter-go.git
cd beam-starter-go&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="run-the-quickstart">Run the quickstart&lt;/h2>
&lt;p>Run the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>go run main.go --input-text=&amp;#34;Greetings&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The output is similar to the following:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Hello
World!
Greetings&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The lines might appear in a different order.&lt;/p>
&lt;h2 id="explore-the-code">Explore the code&lt;/h2>
&lt;p>The main code file for this quickstart is &lt;strong>main.go&lt;/strong>
(&lt;a href="https://github.com/apache/beam-starter-go/blob/main/main.go">GitHub&lt;/a>).
The code performs the following steps:&lt;/p>
&lt;ol>
&lt;li>Create a Beam pipeline.&lt;/li>
&lt;li>Create an initial &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Apply transforms.&lt;/li>
&lt;li>Run the pipeline, using the Direct Runner.&lt;/li>
&lt;/ol>
&lt;h3 id="create-a-pipeline">Create a pipeline&lt;/h3>
&lt;p>Before creating a pipeline, call the &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#Init">&lt;code>Init&lt;/code>&lt;/a> function:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Init&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then create the pipeline:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pipeline&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">scope&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipelineWithRoot&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#NewPipelineWithRoot">&lt;code>NewPipelineWithRoot&lt;/code>&lt;/a> function returns a new
&lt;code>Pipeline&lt;/code> object, along with the pipeline&amp;rsquo;s root scope. A &lt;em>scope&lt;/em> is a
hierarchical grouping for composite transforms.&lt;/p>
&lt;h3 id="create-an-initial-pcollection">Create an initial PCollection&lt;/h3>
&lt;p>The &lt;code>PCollection&lt;/code> abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial &lt;code>PCollection&lt;/code>. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).&lt;/p>
&lt;p>This example uses the &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#Create">&lt;code>Create&lt;/code>&lt;/a> function to create a &lt;code>PCollection&lt;/code>
from an in-memory array of strings. The resulting &lt;code>PCollection&lt;/code> contains the
strings &amp;ldquo;hello&amp;rdquo;, &amp;ldquo;world!&amp;rdquo;, and a user-provided input string.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">elements&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;hello&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;world!&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input_text&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="apply-transforms-to-the-pcollection">Apply transforms to the PCollection&lt;/h3>
&lt;p>Transforms can change, filter, group, analyze, or otherwise process the
elements in a &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>This example adds a &lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a> transform
to convert the input strings to title case:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">elements&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">strings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Title&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">elements&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#ParDo">&lt;code>ParDo&lt;/code>&lt;/a> function takes the parent scope, a transform function that
will be applied to the data, and the input PCollection. It returns the output
PCollection.&lt;/p>
&lt;p>The previous example uses the built-in &lt;a href="https://pkg.go.dev/strings#Title">&lt;code>strings.Title&lt;/code>&lt;/a> function for
the transform. You can also provide an application-defined function to a ParDo.
For example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">logAndEmit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beamLog&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Infoln&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This function logs the input element and returns the same element unmodified.
Create a ParDo for this function as follows:&lt;/p>
&lt;pre tabindex="0">&lt;code>beam.ParDo(scope, logAndEmit, elements)
&lt;/code>&lt;/pre>&lt;p>At runtime, the ParDo will call the &lt;code>logAndEmit&lt;/code> function on each element in
the input collection.&lt;/p>
&lt;h3 id="run-the-pipeline">Run the pipeline&lt;/h3>
&lt;p>The code shown in the previous sections defines a pipeline, but does not
process any data yet. To process data, you run the pipeline:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">beamx&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A Beam &lt;a href="https://beam.apache.org/documentation/basics/#runner">runner&lt;/a> runs a
Beam pipeline on a specific platform. This example uses the Direct Runner,
which is the default runner if you don&amp;rsquo;t specify one. The Direct Runner runs
the pipeline locally on your machine. It is meant for testing and development,
rather than being optimized for efficiency. For more information, see
&lt;a href="https://beam.apache.org/documentation/runners/direct/">Using the Direct Runner&lt;/a>.&lt;/p>
&lt;p>For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/go/">Beam SDK for Go&lt;/a>
and look through the
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go SDK API reference&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our
&lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite
&lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any
issues!&lt;/p></description></item><item><title>Get-Started: Beam Quickstart for Java</title><link>/get-started/quickstart/java/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart/java/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-java-sdk-quickstart">Apache Beam Java SDK quickstart&lt;/h1>
&lt;p>This quickstart shows you how to run an
&lt;a href="https://github.com/apache/beam-starter-java">example pipeline&lt;/a> written with
the &lt;a href="/documentation/sdks/java">Apache Beam Java SDK&lt;/a>, using the
&lt;a href="/documentation/runners/direct/">Direct Runner&lt;/a>. The Direct Runner executes
pipelines locally on your machine.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Java codebase, see the
&lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;p>On this page:&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-development-environment">Set up your development environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#clone-the-github-repository">Clone the GitHub repository&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-quickstart">Run the quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#explore-the-code">Explore the code&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#create-a-pipeline">Create a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#create-an-initial-pcollection">Create an initial PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-pipeline">Run the pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-development-environment">Set up your development environment&lt;/h2>
&lt;p>Use &lt;a href="https://sdkman.io/">&lt;code>sdkman&lt;/code>&lt;/a> to install the Java Development Kit (JDK).&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code># Install sdkman
curl -s &amp;#34;https://get.sdkman.io&amp;#34; | bash
# Install Java 17
sdk install java 17.0.5-tem&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>You can use either &lt;a href="https://gradle.org/">Gradle&lt;/a> or
&lt;a href="https://maven.apache.org/">Apache Maven&lt;/a> to run this quickstart:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code># Install Gradle
sdk install gradle
# Install Maven
sdk install maven&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="clone-the-github-repository">Clone the GitHub repository&lt;/h2>
&lt;p>Clone or download the
&lt;a href="https://github.com/apache/beam-starter-java">apache/beam-starter-java&lt;/a> GitHub
repository and change into the &lt;code>beam-starter-java&lt;/code> directory.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>git clone https://github.com/apache/beam-starter-java.git
cd beam-starter-java&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="run-the-quickstart">Run the quickstart&lt;/h2>
&lt;p>&lt;strong>Gradle&lt;/strong>: To run the quickstart with Gradle, run the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>gradle run --args=&amp;#39;--inputText=Greetings&amp;#39;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Maven&lt;/strong>: To run the quickstart with Maven, run the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>mvn compile exec:java -Dexec.args=--inputText=&amp;#39;Greetings&amp;#39;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The output is similar to the following:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Hello
World!
Greetings&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The lines might appear in a different order.&lt;/p>
&lt;h2 id="explore-the-code">Explore the code&lt;/h2>
&lt;p>The main code file for this quickstart is &lt;strong>App.java&lt;/strong>
(&lt;a href="https://github.com/apache/beam-starter-java/blob/main/src/main/java/com/example/App.java">GitHub&lt;/a>).
The code performs the following steps:&lt;/p>
&lt;ol>
&lt;li>Create a Beam pipeline.&lt;/li>
&lt;li>Create an initial &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Apply a transform to the &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Run the pipeline, using the Direct Runner.&lt;/li>
&lt;/ol>
&lt;h3 id="create-a-pipeline">Create a pipeline&lt;/h3>
&lt;p>The code first creates a &lt;code>Pipeline&lt;/code> object. The &lt;code>Pipeline&lt;/code> object builds up the
graph of transformations to be executed.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">var&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">var&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>PipelineOptions&lt;/code> object lets you set various options for the pipeline. The
&lt;code>fromArgs&lt;/code> method shown in this example parses command-line arguments, which
lets you set pipeline options through the command line.&lt;/p>
&lt;h3 id="create-an-initial-pcollection">Create an initial PCollection&lt;/h3>
&lt;p>The &lt;code>PCollection&lt;/code> abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial &lt;code>PCollection&lt;/code>. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).&lt;/p>
&lt;p>This example uses the
&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Create.html">&lt;code>Create.of&lt;/code>&lt;/a>
method to create a &lt;code>PCollection&lt;/code> from an in-memory array of strings. The
resulting &lt;code>PCollection&lt;/code> contains the strings &amp;ldquo;Hello&amp;rdquo;, &amp;ldquo;World!&amp;rdquo;, and a
user-provided input string.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Create elements&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Hello&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;World!&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">inputText&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/h3>
&lt;p>Transforms can change, filter, group, analyze, or otherwise process the
elements in a &lt;code>PCollection&lt;/code>. This example uses the
&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/MapElements.html">&lt;code>MapElements&lt;/code>&lt;/a>
transform, which maps the elements of a collection into a new collection:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Print elements&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">System&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">println&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>where&lt;/p>
&lt;ul>
&lt;li>&lt;code>into&lt;/code> specifies the data type for the elements in the output collection.&lt;/li>
&lt;li>&lt;code>via&lt;/code> defines a mapping function that is called on each element of the input
collection to create the output collection.&lt;/li>
&lt;/ul>
&lt;p>In this example, the mapping function is a lambda that just returns the
original value. It also prints the value to &lt;code>System.out&lt;/code> as a side effect.&lt;/p>
&lt;h3 id="run-the-pipeline">Run the pipeline&lt;/h3>
&lt;p>The code shown in the previous sections defines a pipeline, but does not
process any data yet. To process data, you run the pipeline:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A Beam &lt;a href="/documentation/basics/#runner">runner&lt;/a> runs a
Beam pipeline on a specific platform. This example uses the
&lt;a href="https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/runners/direct/DirectRunner.html">Direct Runner&lt;/a>,
which is the default runner if you don&amp;rsquo;t specify one. The Direct Runner runs
the pipeline locally on your machine. It is meant for testing and development,
rather than being optimized for efficiency. For more information, see
&lt;a href="/documentation/runners/direct/">Using the Direct Runner&lt;/a>.&lt;/p>
&lt;p>For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/java/">Beam SDK for Java&lt;/a>
and look through the
&lt;a href="https://beam.apache.org/releases/javadoc">Java SDK API reference&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our
&lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite
&lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any
issues!&lt;/p></description></item><item><title>Get-Started: Beam Quickstart for Python</title><link>/get-started/quickstart/python/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart/python/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-python-sdk-quickstart">Apache Beam Python SDK quickstart&lt;/h1>
&lt;p>This quickstart shows you how to run an
&lt;a href="https://github.com/apache/beam-starter-python">example pipeline&lt;/a> written with
the &lt;a href="/documentation/sdks/python">Apache Beam Python SDK&lt;/a>, using the
&lt;a href="/documentation/runners/direct/">Direct Runner&lt;/a>. The Direct Runner executes
pipelines locally on your machine.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Python codebase, see the
&lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;p>On this page:&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-development-environment">Set up your development environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#clone-the-github-repository">Clone the GitHub repository&lt;/a>&lt;/li>
&lt;li>&lt;a href="#create-and-activate-a-virtual-environment">Create and activate a virtual environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#install-the-project-dependences">Install the project dependences&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-quickstart">Run the quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#explore-the-code">Explore the code&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#create-a-pipeline">Create a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#create-an-initial-pcollection">Create an initial PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-pipeline">Run the pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-development-environment">Set up your development environment&lt;/h2>
&lt;p>Apache Beam aims to work on released
&lt;a href="https://devguide.python.org/versions/">Python versions&lt;/a> that have not yet
reached end of life, but it may take a few releases until Apache Beam fully
supports the most recently released Python minor version.&lt;/p>
&lt;p>The minimum required Python version is listed in the &lt;strong>Meta&lt;/strong> section of the
&lt;a href="https://pypi.org/project/apache-beam/">apache-beam&lt;/a> project page under
&lt;strong>Requires&lt;/strong>. The list of all supported Python versions is listed in the
&lt;strong>Classifiers&lt;/strong> section at the bottom of the page, under &lt;strong>Programming
Language&lt;/strong>.&lt;/p>
&lt;p>Check your Python version by running:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>python3 --version&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>If you don&amp;rsquo;t have a Python interpreter, you can download and install it from
the &lt;a href="https://devguide.python.org/versions/">Python downloads&lt;/a> page.&lt;/p>
&lt;p>If you need to install a different version of Python in addition to the version
that you already have, you can find some recommendations in our
&lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-InstallingPythoninterpreters">Developer Wiki&lt;/a>.&lt;/p>
&lt;h2 id="clone-the-github-repository">Clone the GitHub repository&lt;/h2>
&lt;p>Clone or download the
&lt;a href="https://github.com/apache/beam-starter-python">apache/beam-starter-python&lt;/a>
GitHub repository and change into the &lt;code>beam-starter-python&lt;/code> directory.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>git clone https://github.com/apache/beam-starter-python.git
cd beam-starter-python&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="create-and-activate-a-virtual-environment">Create and activate a virtual environment&lt;/h2>
&lt;p>A virtual environment is a directory tree containing its own Python
distribution. We recommend using a virtual environment so that all dependencies
of your project are installed in an isolated and self-contained environment. To
set up a virtual environment, run the following commands:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code># Create a new Python virtual environment.
python3 -m venv env
# Activate the virtual environment.
source env/bin/activate&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>If these commands do not work on your platform, see the
&lt;a href="https://docs.python.org/3/library/venv.html#how-venvs-work">&lt;code>venv&lt;/code>&lt;/a>
documentation.&lt;/p>
&lt;h2 id="install-the-project-dependences">Install the project dependences&lt;/h2>
&lt;p>Run the following command to install the project&amp;rsquo;s dependencies from the
&lt;code>requirements.txt&lt;/code> file:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>pip install -e .&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="run-the-quickstart">Run the quickstart&lt;/h2>
&lt;p>Run the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>python main.py --input-text=&amp;#34;Greetings&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The output is similar to the following:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Hello
World!
Greetings&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The lines might appear in a different order.&lt;/p>
&lt;p>Run the following command to deactivate the virtual environment:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>deactivate&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="explore-the-code">Explore the code&lt;/h2>
&lt;p>The main code file for this quickstart is &lt;strong>app.py&lt;/strong>
(&lt;a href="https://github.com/apache/beam-starter-python/blob/main/my_app/app.py">GitHub&lt;/a>).
The code performs the following steps:&lt;/p>
&lt;ol>
&lt;li>Create a Beam pipeline.&lt;/li>
&lt;li>Create an initial &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Apply a transform to the &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Run the pipeline, using the Direct Runner.&lt;/li>
&lt;/ol>
&lt;h3 id="create-a-pipeline">Create a pipeline&lt;/h3>
&lt;p>The code first creates a &lt;code>Pipeline&lt;/code> object. The &lt;code>Pipeline&lt;/code> object builds up the
graph of transformations to be executed.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam_options&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>beam_option&lt;/code> variable shown here is a &lt;code>PipelineOptions&lt;/code> object, which
is used to set options for the pipeline. For more information, see
&lt;a href="/documentation/programming-guide/#configuring-pipeline-options">Configuring pipeline options&lt;/a>.&lt;/p>
&lt;h3 id="create-an-initial-pcollection">Create an initial PCollection&lt;/h3>
&lt;p>The &lt;code>PCollection&lt;/code> abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial &lt;code>PCollection&lt;/code>. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).&lt;/p>
&lt;p>This example uses the
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Create">&lt;code>Create&lt;/code>&lt;/a>
method to create a &lt;code>PCollection&lt;/code> from an in-memory array of strings. The
resulting &lt;code>PCollection&lt;/code> contains the strings &amp;ldquo;Hello&amp;rdquo;, &amp;ldquo;World!&amp;rdquo;, and a
user-provided input string.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Create elements&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s2">&amp;#34;Hello&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;World!&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_text&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Note: The pipe operator &lt;code>|&lt;/code> is used to
&lt;a href="/documentation/programming-guide/#applying-transforms">chain&lt;/a> transforms.&lt;/p>
&lt;h3 id="apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/h3>
&lt;p>Transforms can change, filter, group, analyze, or otherwise process the
elements in a &lt;code>PCollection&lt;/code>. This example uses the
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Map">&lt;code>Map&lt;/code>&lt;/a>
transform, which maps the elements of a collection into a new collection:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Print elements&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="run-the-pipeline">Run the pipeline&lt;/h3>
&lt;p>To run the pipeline, you can call the
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pipeline.html#apache_beam.pipeline.Pipeline.run">&lt;code>Pipeline.run&lt;/code>&lt;/a>
method:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">run&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">wait_until_finish&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, by enclosing the &lt;code>Pipeline&lt;/code> object inside a &lt;code>with&lt;/code> statement, the
&lt;code>run&lt;/code> method is automatically invoked.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam_options&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># ...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># run() is called automatically&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A Beam &lt;a href="/documentation/basics/#runner">runner&lt;/a> runs a Beam pipeline on a
specific platform. If you don&amp;rsquo;t specify a runner, the Direct Runner is the
default. The Direct Runner runs the pipeline locally on your machine. It is
meant for testing and development, rather than being optimized for efficiency.
For more information, see
&lt;a href="/documentation/runners/direct/">Using the Direct Runner&lt;/a>.&lt;/p>
&lt;p>For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/python/">Beam SDK for Python&lt;/a>
and look through the
&lt;a href="https://beam.apache.org/releases/pydoc/current">Python SDK API reference&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our
&lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite
&lt;a href="/documentation/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any
issues!&lt;/p></description></item><item><title>Get-Started: Beam Quickstart for Typescript</title><link>/get-started/quickstart/typescript/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart/typescript/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-typescript-sdk-quickstart">Apache Beam Typescript SDK quickstart&lt;/h1>
&lt;p>This quickstart shows you how to run an
&lt;a href="https://github.com/apache/beam-starter-typescript">example pipeline&lt;/a> written with
the &lt;a href="/documentation/sdks/typescript">Apache Beam Typescript SDK&lt;/a>, using the
&lt;a href="/documentation/runners/direct/">Direct Runner&lt;/a>. The Direct Runner executes
pipelines locally on your machine.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Typescript codebase, see the
&lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;p>On this page:&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-development-environment">Set up your development environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#clone-the-github-repository">Clone the GitHub repository&lt;/a>&lt;/li>
&lt;li>&lt;a href="#install-the-project-dependences">Install the project dependences&lt;/a>&lt;/li>
&lt;li>&lt;a href="#compile-the-pipeline">Compile the pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-quickstart">Run the quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#explore-the-code">Explore the code&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#create-a-pipeline">Create a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#create-an-initial-pcollection">Create an initial PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-the-pipeline">Run the pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-development-environment">Set up your development environment&lt;/h2>
&lt;p>Make sure you have a &lt;a href="https://nodejs.org/">Node.js&lt;/a> development environment installed.
If you don&amp;rsquo;t, you can download and install it from the
&lt;a href="https://nodejs.org/en/download/">downloads page&lt;/a>.&lt;/p>
&lt;p>Due to its extensive use of cross-language transforms, it is recommended that
Python 3 and Java be available on the system as well.&lt;/p>
&lt;h2 id="clone-the-github-repository">Clone the GitHub repository&lt;/h2>
&lt;p>Clone or download the
&lt;a href="https://github.com/apache/beam-starter-typescript">apache/beam-starter-typescript&lt;/a>
GitHub repository and change into the &lt;code>beam-starter-typescript&lt;/code> directory.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>git clone https://github.com/apache/beam-starter-typescript.git
cd beam-starter-typescript&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="install-the-project-dependences">Install the project dependences&lt;/h2>
&lt;p>Run the following command to install the project&amp;rsquo;s dependencies.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>npm install&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="compile-the-pipeline">Compile the pipeline&lt;/h2>
&lt;p>The pipeline is then built with&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>npm run build&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="run-the-quickstart">Run the quickstart&lt;/h2>
&lt;p>Run the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>node dist/src/main.js --input_text=&amp;#34;Greetings&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The output is similar to the following:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Hello
World!
Greetings&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The lines might appear in a different order.&lt;/p>
&lt;h2 id="explore-the-code">Explore the code&lt;/h2>
&lt;p>The main code file for this quickstart is &lt;strong>app.ts&lt;/strong>
(&lt;a href="https://github.com/apache/beam-starter-typescript/blob/main/src/app.ts">GitHub&lt;/a>).
The code performs the following steps:&lt;/p>
&lt;ol>
&lt;li>Define a Beam pipeline that.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Creates an initial &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>Applies a transform (map) to the &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>Run the pipeline, using the Direct Runner.&lt;/li>
&lt;/ol>
&lt;h3 id="create-a-pipeline">Create a pipeline&lt;/h3>
&lt;p>A &lt;code>Pipeline&lt;/code> is simply a callable that takes a single &lt;code>root&lt;/code> object.
The &lt;code>Pipeline&lt;/code> function builds up the graph of transformations to be executed.&lt;/p>
&lt;h3 id="create-an-initial-pcollection">Create an initial PCollection&lt;/h3>
&lt;p>The &lt;code>PCollection&lt;/code> abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial &lt;code>PCollection&lt;/code>. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).&lt;/p>
&lt;p>This example uses the
&lt;a href="https://beam.apache.org/releases/typedoc/current/functions/transforms_create.create.html">&lt;code>Create&lt;/code>&lt;/a>
method to create a &lt;code>PCollection&lt;/code> from an in-memory array of strings. The
resulting &lt;code>PCollection&lt;/code> contains the strings &amp;ldquo;Hello&amp;rdquo;, &amp;ldquo;World!&amp;rdquo;, and a
user-provided input string.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s2">&amp;#34;Hello&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;World!&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input_text&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="apply-a-transform-to-the-pcollection">Apply a transform to the PCollection&lt;/h3>
&lt;p>Transforms can change, filter, group, analyze, or otherwise process the
elements in a &lt;code>PCollection&lt;/code>. This example uses the
&lt;a href="https://beam.apache.org/releases/typedoc/current/classes/pvalue.PCollection.html#map">&lt;code>Map&lt;/code>&lt;/a>
transform, which maps the elements of a collection into a new collection:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">printAndReturn&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For convenience, &lt;code>PColletion&lt;/code> has a &lt;code>map&lt;/code> method, but more generally transforms
are applied with &lt;code>.apply(someTransform())&lt;/code>.&lt;/p>
&lt;h3 id="run-the-pipeline">Run the pipeline&lt;/h3>
&lt;p>To run the pipeline, a runner is created (possibly with some options)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">createRunner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">options&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>and then its &lt;code>run&lt;/code> method is invoked on the pipeline callable created above.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="p">.&lt;/span>&lt;span class="nx">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">createPipeline&lt;/span>&lt;span class="p">(...));&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A Beam &lt;a href="/documentation/basics/#runner">runner&lt;/a> runs a Beam pipeline on a
specific platform. If you don&amp;rsquo;t specify a runner, the Direct Runner is the
default. The Direct Runner runs the pipeline locally on your machine. It is
meant for testing and development, rather than being optimized for efficiency.
For more information, see
&lt;a href="/documentation/runners/direct/">Using the Direct Runner&lt;/a>.&lt;/p>
&lt;p>For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.
Different runners can be requested via the runner property on options, e.g.
&lt;code>createRunner({runner: &amp;quot;dataflow&amp;quot;})&lt;/code> or &lt;code>createRunner({runner: &amp;quot;flink&amp;quot;})&lt;/code>.
In this example this value can be passed in via the command line as
&lt;code>--runner=...&lt;/code>, e.g. to run on Dataflow one would write&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sh" data-lang="sh">&lt;span class="line">&lt;span class="cl">node dist/src/main.js &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --runner&lt;span class="o">=&lt;/span>dataflow &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --project&lt;span class="o">=&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">PROJECT_ID&lt;/span>&lt;span class="si">}&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --tempLocation&lt;span class="o">=&lt;/span>gs://&lt;span class="si">${&lt;/span>&lt;span class="nv">GCS_BUCKET&lt;/span>&lt;span class="si">}&lt;/span>/wordcount-js/temp --region&lt;span class="o">=&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">REGION&lt;/span>&lt;span class="si">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/typescript/">Beam SDK for Typescript&lt;/a>
and look through the
&lt;a href="https://beam.apache.org/releases/typedoc/current">Typescript SDK API reference&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our
&lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite
&lt;a href="/documentation/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any
issues!&lt;/p></description></item><item><title>Get-Started: Beam Releases</title><link>/get-started/downloads/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/downloads/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beamsupsup-downloads">Apache Beam&lt;sup>®&lt;/sup> Downloads&lt;/h1>
&lt;blockquote>
&lt;p>Beam SDK 2.55.1 is the latest released version.&lt;/p>
&lt;/blockquote>
&lt;h2 id="using-a-central-repository">Using a central repository&lt;/h2>
&lt;p>The easiest way to use Apache Beam is via one of the released versions in a
central repository. The Java SDK is available on &lt;a href="https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22">Maven Central Repository&lt;/a>,
and the Python SDK is available on &lt;a href="https://pypi.python.org/pypi/apache-beam">PyPI&lt;/a>.&lt;/p>
&lt;p>For example, if you are developing using Maven and want to use the SDK for Java
with the &lt;code>DirectRunner&lt;/code>, add the following dependencies to your &lt;code>pom.xml&lt;/code> file:&lt;/p>
&lt;pre>&lt;code>&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.beam&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;beam-sdks-java-core&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;2.55.1&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.beam&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;beam-runners-direct-java&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;2.55.1&amp;lt;/version&amp;gt;
&amp;lt;scope&amp;gt;runtime&amp;lt;/scope&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code>&lt;/pre>
&lt;p>Similarly in Python, if you are using PyPI and want to use the SDK for Python
with &lt;code>DirectRunner&lt;/code>, add the following requirement to your &lt;code>setup.py&lt;/code> file:&lt;/p>
&lt;pre>&lt;code>apache-beam==2.55.1
&lt;/code>&lt;/pre>
&lt;p>Additionally, you may want to depend on additional SDK modules, such as IO
connectors or other extensions, and additional runners to execute your pipeline
at scale.&lt;/p>
&lt;p>The Go SDK is accessible via Go Modules and calling &lt;code>go get&lt;/code> from a module subdirectory:&lt;/p>
&lt;pre>&lt;code> go get github.com/apache/beam/sdks/v2/go/pkg/beam
&lt;/code>&lt;/pre>
&lt;p>Specific versions can be depended on similarly:&lt;/p>
&lt;pre>&lt;code> go get github.com/apache/beam/sdks/v2@v2.55.1/go/pkg/beam
&lt;/code>&lt;/pre>
&lt;h2 id="downloading-source-code">Downloading source code&lt;/h2>
&lt;p>You can download the source code package for a release from the links in the
&lt;a href="#releases">Releases&lt;/a> section.&lt;/p>
&lt;h3 id="release-integrity">Release integrity&lt;/h3>
&lt;p>You &lt;em>must&lt;/em> &lt;a href="https://www.apache.org/info/verification.html">verify&lt;/a> the integrity
of downloaded files. We provide OpenPGP signatures for every release file. This
signature should be matched against the
&lt;a href="https://downloads.apache.org/beam/KEYS">KEYS&lt;/a> file which contains the OpenPGP
keys of Apache Beam&amp;rsquo;s Release Managers. We also provide SHA-512 checksums for
every release file (or SHA-1 and MD5 checksums for older releases). After you
download the file, you should calculate a checksum for your download, and make
sure it is the same as ours.&lt;/p>
&lt;h2 id="api-stability">API stability&lt;/h2>
&lt;p>Apache Beam generally follows the rules of
&lt;a href="https://semver.org/">semantic versioning&lt;/a> with exceptions. Version numbers use
the form &lt;code>major.minor.patch&lt;/code> and are incremented as follows:&lt;/p>
&lt;ul>
&lt;li>major version for incompatible API changes&lt;/li>
&lt;li>minor version for new functionality added in a backward-compatible manner, infrequent incompatible API changes&lt;/li>
&lt;li>patch version for forward-compatible bug fixes&lt;/li>
&lt;/ul>
&lt;p>Please note that APIs marked &lt;a href="https://beam.apache.org/releases/javadoc/2.55.1/org/apache/beam/sdk/annotations/Experimental.html">&lt;code>@Experimental&lt;/code>&lt;/a>
may change at any point and are not guaranteed to remain compatible across versions.&lt;/p>
&lt;p>Additionally, any API may change before the first stable release, i.e., between
versions denoted &lt;code>0.x.y&lt;/code>.&lt;/p>
&lt;h2 id="releases">Releases&lt;/h2>
&lt;h3 id="2551-2024-03-25">2.55.1 (2024-03-25)&lt;/h3>
&lt;p>Official &lt;a href="https://downloads.apache.org/beam/2.55.1/apache-beam-2.55.1-source-release.zip">source code download&lt;/a>.
&lt;a href="https://downloads.apache.org/beam/2.55.1/apache-beam-2.55.1-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://downloads.apache.org/beam/2.55.1/apache-beam-2.55.1-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.55.1">Release notes&lt;/a>&lt;/p>
&lt;h3 id="2550-2024-03-25">2.55.0 (2024-03-25)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/beam/2.55.0/apache-beam-2.55.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/beam/2.55.0/apache-beam-2.55.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/beam/2.55.0/apache-beam-2.55.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.55.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.55.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2540-2024-02-14">2.54.0 (2024-02-14)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.54.0/apache-beam-2.54.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.54.0/apache-beam-2.54.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.54.0/apache-beam-2.54.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.54.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.54.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2530-2024-01-04">2.53.0 (2024-01-04)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.53.0/apache-beam-2.53.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.53.0/apache-beam-2.53.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.53.0/apache-beam-2.53.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.53.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.53.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2520-2023-11-17">2.52.0 (2023-11-17)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.52.0/apache-beam-2.52.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.52.0/apache-beam-2.52.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.52.0/apache-beam-2.52.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.52.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.52.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2510-2023-10-11">2.51.0 (2023-10-11)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.51.0/apache-beam-2.51.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.51.0/apache-beam-2.51.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.51.0/apache-beam-2.51.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.51.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.51.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2500-2023-08-30">2.50.0 (2023-08-30)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.50.0/apache-beam-2.50.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.50.0/apache-beam-2.50.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.50.0/apache-beam-2.50.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.50.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.50.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2490-2023-07-17">2.49.0 (2023-07-17)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.49.0/apache-beam-2.49.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.49.0/apache-beam-2.49.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.49.0/apache-beam-2.49.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.49.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.49.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2480-2023-05-31">2.48.0 (2023-05-31)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.48.0/apache-beam-2.48.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.48.0/apache-beam-2.48.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.48.0/apache-beam-2.48.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.48.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.48.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2470-2023-05-10">2.47.0 (2023-05-10)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.47.0/apache-beam-2.47.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.47.0/apache-beam-2.47.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.47.0/apache-beam-2.47.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.47.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.47.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2460-2023-03-10">2.46.0 (2023-03-10)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.46.0/apache-beam-2.46.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.46.0/apache-beam-2.46.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.46.0/apache-beam-2.46.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.46.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.46.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2450-2023-02-15">2.45.0 (2023-02-15)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.45.0/apache-beam-2.45.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.45.0/apache-beam-2.45.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.45.0/apache-beam-2.45.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.45.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.45.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2440-2023-01-12">2.44.0 (2023-01-12)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.44.0/apache-beam-2.44.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.44.0/apache-beam-2.44.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.44.0/apache-beam-2.44.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.44.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.44.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2430-2022-11-17">2.43.0 (2022-11-17)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.43.0/apache-beam-2.43.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.43.0/apache-beam-2.43.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.43.0/apache-beam-2.43.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.43.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.43.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2420-2022-10-17">2.42.0 (2022-10-17)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.42.0/apache-beam-2.42.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.42.0/apache-beam-2.42.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.42.0/apache-beam-2.42.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.42.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.42.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2410-2022-08-23">2.41.0 (2022-08-23)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.41.0/apache-beam-2.41.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.41.0/apache-beam-2.41.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.41.0/apache-beam-2.41.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.41.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.41.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2400-2022-06-25">2.40.0 (2022-06-25)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.40.0/apache-beam-2.40.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.40.0/apache-beam-2.40.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.40.0/apache-beam-2.40.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/apache/beam/releases/tag/v2.40.0">Release notes&lt;/a>
&lt;a href="/blog/beam-2.40.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2390-2022-05-25">2.39.0 (2022-05-25)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.39.0/apache-beam-2.39.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.39.0/apache-beam-2.39.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.39.0/apache-beam-2.39.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12351169">Release notes&lt;/a>
&lt;a href="/blog/beam-2.39.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2380-2022-04-20">2.38.0 (2022-04-20)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.38.0/apache-beam-2.38.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.38.0/apache-beam-2.38.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.38.0/apache-beam-2.38.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12351169">Release notes&lt;/a>
&lt;a href="/blog/beam-2.38.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2370-2022-03-04">2.37.0 (2022-03-04)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.37.0/apache-beam-2.37.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.37.0/apache-beam-2.37.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.37.0/apache-beam-2.37.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12351168">Release notes&lt;/a>
&lt;a href="/blog/beam-2.37.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2360-2022-02-07">2.36.0 (2022-02-07)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.36.0/apache-beam-2.36.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.36.0/apache-beam-2.36.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.36.0/apache-beam-2.36.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12350407">Release notes&lt;/a>
&lt;a href="/blog/beam-2.36.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2350-2021-12-29">2.35.0 (2021-12-29)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.35.0/apache-beam-2.35.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.35.0/apache-beam-2.35.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.35.0/apache-beam-2.35.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12350406">Release notes&lt;/a>
&lt;a href="/blog/beam-2.35.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2340-2021-11-11">2.34.0 (2021-11-11)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.34.0/apache-beam-2.34.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.34.0/apache-beam-2.34.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.34.0/apache-beam-2.34.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12350405">Release notes&lt;/a>
&lt;a href="/blog/beam-2.34.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2330-2021-10-07">2.33.0 (2021-10-07)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.33.0/apache-beam-2.33.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.33.0/apache-beam-2.33.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.33.0/apache-beam-2.33.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12350404">Release notes&lt;/a>
&lt;a href="/blog/beam-2.33.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2320-2021-08-25">2.32.0 (2021-08-25)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.32.0/apache-beam-2.32.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.32.0/apache-beam-2.32.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.32.0/apache-beam-2.32.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349992">Release notes&lt;/a>
&lt;a href="/blog/beam-2.32.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2310-2021-07-08">2.31.0 (2021-07-08)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.31.0/apache-beam-2.31.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.31.0/apache-beam-2.31.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.31.0/apache-beam-2.31.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349991">Release notes&lt;/a>
&lt;a href="/blog/beam-2.31.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2300-2021-06-09">2.30.0 (2021-06-09)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.30.0/apache-beam-2.30.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.30.0/apache-beam-2.30.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.30.0/apache-beam-2.30.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349978">Release notes&lt;/a>
&lt;a href="/blog/beam-2.30.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2290-2021-04-27">2.29.0 (2021-04-27)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.29.0/apache-beam-2.29.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.29.0/apache-beam-2.29.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.29.0/apache-beam-2.29.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349629">Release notes&lt;/a>
&lt;a href="/blog/beam-2.29.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2280-2021-02-22">2.28.0 (2021-02-22)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.28.0/apache-beam-2.28.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.28.0/apache-beam-2.28.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.28.0/apache-beam-2.28.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349499">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.28.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2270-2020-12-22">2.27.0 (2020-12-22)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.27.0/apache-beam-2.27.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.27.0/apache-beam-2.27.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.27.0/apache-beam-2.27.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12349380">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.27.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2260-2020-12-11">2.26.0 (2020-12-11)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.26.0/apache-beam-2.26.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.26.0/apache-beam-2.26.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.26.0/apache-beam-2.26.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12348833">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.26.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2250-2020-10-23">2.25.0 (2020-10-23)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.25.0/apache-beam-2.25.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.25.0/apache-beam-2.25.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.25.0/apache-beam-2.25.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12347147">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.25.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2240-2020-09-18">2.24.0 (2020-09-18)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.24.0/apache-beam-2.24.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.24.0/apache-beam-2.24.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.24.0/apache-beam-2.24.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12347146">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.24.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2230-2020-07-29">2.23.0 (2020-07-29)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.23.0/apache-beam-2.23.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.23.0/apache-beam-2.23.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.23.0/apache-beam-2.23.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12347145">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.23.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2220-2020-06-08">2.22.0 (2020-06-08)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.22.0/apache-beam-2.22.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.22.0/apache-beam-2.22.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.22.0/apache-beam-2.22.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12347144">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.22.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2210-2020-05-27">2.21.0 (2020-05-27)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.21.0/apache-beam-2.21.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.21.0/apache-beam-2.21.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.21.0/apache-beam-2.21.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12347143">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.21.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2200-2020-04-15">2.20.0 (2020-04-15)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.20.0/apache-beam-2.20.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.20.0/apache-beam-2.20.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.20.0/apache-beam-2.20.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12346780">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.20.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2190-2020-02-04">2.19.0 (2020-02-04)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.19.0/apache-beam-2.19.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.19.0/apache-beam-2.19.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.19.0/apache-beam-2.19.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12346582">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.19.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2180-2020-01-23">2.18.0 (2020-01-23)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.18.0/apache-beam-2.18.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.18.0/apache-beam-2.18.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.18.0/apache-beam-2.18.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346383&amp;amp;projectId=12319527">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.18.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2170-2020-01-06">2.17.0 (2020-01-06)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.17.0/apache-beam-2.17.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.17.0/apache-beam-2.17.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.17.0/apache-beam-2.17.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12345970">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.17.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2160-2019-10-07">2.16.0 (2019-10-07)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.16.0/apache-beam-2.16.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.16.0/apache-beam-2.16.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.16.0/apache-beam-2.16.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12345494">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.16.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2150-2019-08-22">2.15.0 (2019-08-22)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.15.0/apache-beam-2.15.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.15.0/apache-beam-2.15.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.15.0/apache-beam-2.15.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12345489">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.15.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2140-2019-08-01">2.14.0 (2019-08-01)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.14.0/apache-beam-2.14.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.14.0/apache-beam-2.14.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.14.0/apache-beam-2.14.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12345431">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.14.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2130-2019-05-21">2.13.0 (2019-05-21)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.13.0/apache-beam-2.13.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.13.0/apache-beam-2.13.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.13.0/apache-beam-2.13.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12345166">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.13.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2120-2019-04-25">2.12.0 (2019-04-25)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.12.0/apache-beam-2.12.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.12.0/apache-beam-2.12.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.12.0/apache-beam-2.12.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12344944">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.12.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2110-2019-02-26">2.11.0 (2019-02-26)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.11.0/apache-beam-2.11.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.11.0/apache-beam-2.11.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.11.0/apache-beam-2.11.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12344775">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.11.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="2100-2019-02-01">2.10.0 (2019-02-01)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.10.0/apache-beam-2.10.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.10.0/apache-beam-2.10.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.10.0/apache-beam-2.10.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12344540">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.10.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="290-2018-12-13">2.9.0 (2018-12-13)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.9.0/apache-beam-2.9.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.9.0/apache-beam-2.9.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.9.0/apache-beam-2.9.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12344258">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.9.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="280-2018-10-26">2.8.0 (2018-10-26)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.8.0/apache-beam-2.8.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.8.0/apache-beam-2.8.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.8.0/apache-beam-2.8.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12343985">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.8.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="270-lts-2018-10-02">2.7.0 LTS (2018-10-02)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.7.0/apache-beam-2.7.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.7.0/apache-beam-2.7.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.7.0/apache-beam-2.7.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12343654">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.7.0">Blog post&lt;/a>.&lt;/p>
&lt;p>2.7.0 was &lt;a href="https://lists.apache.org/thread.html/896cbc9fef2e60f19b466d6b1e12ce1aeda49ce5065a0b1156233f01@%3Cdev.beam.apache.org%3E">designated&lt;/a> by the Beam community as a long term support (LTS) version. LTS versions are supported for a window of 6 months starting from the day it is marked as an LTS. Beam community will decide on which issues will be backported and when patch releases on the branch will be made on a case by case basis.&lt;/p>
&lt;p>&lt;em>LTS Update (2020-04-06):&lt;/em> Due to the lack of interest from users the Beam community decided not to maintain or publish new LTS releases. We encourage users to update early and often to the most recent releases.&lt;/p>
&lt;h3 id="260-2018-08-08">2.6.0 (2018-08-08)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.6.0/apache-beam-2.6.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.6.0/apache-beam-2.6.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.6.0/apache-beam-2.6.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12343392">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.6.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="250-2018-06-06">2.5.0 (2018-06-06)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.5.0/apache-beam-2.5.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.5.0/apache-beam-2.5.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.5.0/apache-beam-2.5.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12342847">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.5.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="240-2018-03-20">2.4.0 (2018-03-20)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.4.0/apache-beam-2.4.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.4.0/apache-beam-2.4.0-source-release.zip.sha512">SHA-512&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.4.0/apache-beam-2.4.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12342682">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.4.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="230-2018-01-30">2.3.0 (2018-01-30)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.3.0/apache-beam-2.3.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.3.0/apache-beam-2.3.0-source-release.zip.sha1">SHA-1&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.3.0/apache-beam-2.3.0-source-release.zip.md5">MD5&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.3.0/apache-beam-2.3.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12341608">Release notes&lt;/a>.
&lt;a href="/blog/beam-2.3.0">Blog post&lt;/a>.&lt;/p>
&lt;h3 id="220-2017-12-02">2.2.0 (2017-12-02)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.2.0/apache-beam-2.2.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.2.0/apache-beam-2.2.0-source-release.zip.sha1">SHA-1&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.2.0/apache-beam-2.2.0-source-release.zip.md5">MD5&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.2.0/apache-beam-2.2.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12341044">Release notes&lt;/a>.&lt;/p>
&lt;h3 id="210-2017-08-23">2.1.0 (2017-08-23)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.1.0/apache-beam-2.1.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.1.0/apache-beam-2.1.0-source-release.zip.sha1">SHA-1&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.1.0/apache-beam-2.1.0-source-release.zip.md5">MD5&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.1.0/apache-beam-2.1.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12340528">Release notes&lt;/a>.&lt;/p>
&lt;h3 id="200-2017-05-17">2.0.0 (2017-05-17)&lt;/h3>
&lt;p>Official &lt;a href="https://archive.apache.org/dist/beam/2.0.0/apache-beam-2.0.0-source-release.zip">source code download&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.0.0/apache-beam-2.0.0-source-release.zip.sha1">SHA-1&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.0.0/apache-beam-2.0.0-source-release.zip.md5">MD5&lt;/a>.
&lt;a href="https://archive.apache.org/dist/beam/2.0.0/apache-beam-2.0.0-source-release.zip.asc">signature&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&amp;amp;version=12339746">Release notes&lt;/a>.&lt;/p></description></item><item><title>Get-Started: Beam WordCount Examples</title><link>/get-started/wordcount-example/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/wordcount-example/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-wordcount-examples">Apache Beam WordCount Examples&lt;/h1>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#minimalwordcount-example">MinimalWordCount example&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#creating-the-pipeline">Creating the pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#applying-pipeline-transforms">Applying pipeline transforms&lt;/a>&lt;/li>
&lt;li>&lt;a href="#running-the-pipeline">Running the pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#try-the-full-example-in-playground">Try the full example in Playground&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#wordcount-example">WordCount example&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#specifying-explicit-dofns">Specifying explicit DoFns&lt;/a>&lt;/li>
&lt;li>&lt;a href="#creating-composite-transforms">Creating composite transforms&lt;/a>&lt;/li>
&lt;li>&lt;a href="#using-parameterizable-pipelineoptions">Using parameterizable PipelineOptions&lt;/a>&lt;/li>
&lt;li>&lt;a href="#try-the-full-example-in-playground-1">Try the full example in Playground&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#debuggingwordcount-example">DebuggingWordCount example&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#logging">Logging&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#direct-runner">Direct Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="#cloud-dataflow-runner">Cloud Dataflow Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apache-spark-runner">Apache Spark Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apache-flink-runner">Apache Flink Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apache-nemo-runner">Apache Nemo Runner&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#testing-your-pipeline-with-asserts">Testing your pipeline with asserts&lt;/a>&lt;/li>
&lt;li>&lt;a href="#try-the-full-example-in-playground-2">Try the full example in Playground&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#windowedwordcount-example">WindowedWordCount example&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#unbounded-and-bounded-datasets">Unbounded and bounded datasets&lt;/a>&lt;/li>
&lt;li>&lt;a href="#adding-timestamps-to-data">Adding timestamps to data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#windowing">Windowing&lt;/a>&lt;/li>
&lt;li>&lt;a href="#reusing-ptransforms-over-windowed-pcollections">Reusing PTransforms over windowed PCollections&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#streamingwordcount-example">StreamingWordCount example&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#reading-an-unbounded-dataset">Reading an unbounded dataset&lt;/a>&lt;/li>
&lt;li>&lt;a href="#writing-unbounded-results">Writing unbounded results&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>The WordCount examples demonstrate how to set up a processing pipeline that can
read text, tokenize the text lines into individual words, and perform a
frequency count on each of those words. The Beam SDKs contain a series of these
four successively more detailed WordCount examples that build on each other. The
input text for all the examples is a set of Shakespeare&amp;rsquo;s texts.&lt;/p>
&lt;p>Each WordCount example introduces different concepts in the Beam programming
model. Begin by understanding MinimalWordCount, the simplest of the examples.
Once you feel comfortable with the basic principles in building a pipeline,
continue on to learn more concepts in the other examples.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>MinimalWordCount&lt;/strong> demonstrates the basic principles involved in building a
pipeline.&lt;/li>
&lt;li>&lt;strong>WordCount&lt;/strong> introduces some of the more common best practices in creating
re-usable and maintainable pipelines.&lt;/li>
&lt;li>&lt;strong>DebuggingWordCount&lt;/strong> introduces logging and debugging practices.&lt;/li>
&lt;li>&lt;strong>WindowedWordCount&lt;/strong> demonstrates how you can use Beam&amp;rsquo;s programming model
to handle both bounded and unbounded datasets.&lt;/li>
&lt;/ul>
&lt;h2 id="minimalwordcount-example">MinimalWordCount example&lt;/h2>
&lt;p>MinimalWordCount demonstrates a simple pipeline that uses the Direct Runner to
read from a text file, apply transforms to tokenize and count the words, and
write the data to an output text file.&lt;/p>
&lt;p class="language-java language-go">This example hard-codes the locations for its input and output files and doesn&amp;rsquo;t
perform any error checking; it is intended to only show you the &amp;ldquo;bare bones&amp;rdquo; of
creating a Beam pipeline. This lack of parameterization makes this particular
pipeline less portable across different runners than standard Beam pipelines. In
later examples, we will parameterize the pipeline&amp;rsquo;s input and output sources and
show other best practices.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">$&lt;/span> &lt;span class="n">mvn&lt;/span> &lt;span class="n">compile&lt;/span> &lt;span class="n">exec&lt;/span>&lt;span class="o">:&lt;/span>&lt;span class="n">java&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">Dexec&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">mainClass&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">examples&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">MinimalWordCount&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">python&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">m&lt;/span> &lt;span class="n">apache_beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">examples&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">wordcount_minimal&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="nb">input&lt;/span> &lt;span class="n">YOUR_INPUT_FILE&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">output&lt;/span> &lt;span class="n">counts&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="err">$&lt;/span> &lt;span class="k">go&lt;/span> &lt;span class="nx">install&lt;/span> &lt;span class="nx">github&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">com&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">apache&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">sdks&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">v2&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="k">go&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">examples&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="nx">minimal_wordcount&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">$&lt;/span> &lt;span class="nx">minimal_wordcount&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">To view the full code in Java, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java">MinimalWordCount&lt;/a>.&lt;/strong>&lt;/p>
&lt;p class="language-py">To view the full code in Python, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_minimal.py">wordcount_minimal.py&lt;/a>.&lt;/strong>&lt;/p>
&lt;p class="language-go">To view the full code in Go, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/examples/minimal_wordcount/minimal_wordcount.go">minimal_wordcount.go&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>Key Concepts:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Creating the Pipeline&lt;/li>
&lt;li>Applying transforms to the Pipeline&lt;/li>
&lt;li>Reading input (in this example: reading text files)&lt;/li>
&lt;li>Applying ParDo transforms&lt;/li>
&lt;li>Applying SDK-provided transforms (in this example: Count)&lt;/li>
&lt;li>Writing output (in this example: writing to a text file)&lt;/li>
&lt;li>Running the Pipeline&lt;/li>
&lt;/ul>
&lt;p>The following sections explain these concepts in detail, using the relevant code
excerpts from the MinimalWordCount pipeline.&lt;/p>
&lt;h3 id="creating-the-pipeline">Creating the pipeline&lt;/h3>
&lt;p class="language-java language-py">In this example, the code first creates a &lt;code>PipelineOptions&lt;/code> object. This object
lets us set various options for our pipeline, such as the pipeline runner that
will execute our pipeline and any runner-specific configuration required by the
chosen runner. In this example we set these options programmatically, but more
often, command-line arguments are used to set &lt;code>PipelineOptions&lt;/code>.&lt;/p>
&lt;p class="language-java language-py">You can specify a runner for executing your pipeline, such as the
&lt;code>DataflowRunner&lt;/code> or &lt;code>SparkRunner&lt;/code>. If you omit specifying a runner, as in this
example, your pipeline executes locally using the &lt;code>DirectRunner&lt;/code>. In the next
sections, we will specify the pipeline&amp;rsquo;s runner.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create a PipelineOptions object. This object lets us set various execution
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// options for our pipeline, such as the runner you wish to use. This example
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// will run with the DirectRunner by default, based on the class path configured
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// in its dependencies.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.options.pipeline_options&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_file&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;gs://dataflow-samples/shakespeare/kinglear.txt&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_path&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;gs://my-bucket/counts.txt&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam_options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">runner&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;DataflowRunner&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">project&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;my-project-id&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">job_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;unique-job-name&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">temp_location&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://my-bucket/temp&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py">The next step is to create a &lt;code>Pipeline&lt;/code> object with the options we&amp;rsquo;ve just
constructed. The Pipeline object builds up the graph of transformations to be
executed, associated with that particular pipeline.&lt;/p>
&lt;p class="language-go">The first step is to create a &lt;code>Pipeline&lt;/code> object. It builds up the graph of
transformations to be executed, associated with that particular pipeline.
The scope allows grouping into composite transforms.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam_options&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">p&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipeline&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">s&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Root&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="applying-pipeline-transforms">Applying pipeline transforms&lt;/h3>
&lt;p>The MinimalWordCount pipeline contains several transforms to read data into the
pipeline, manipulate or otherwise transform the data, and write out the results.
Transforms can consist of an individual operation, or can contain multiple
nested transforms (which is a &lt;a href="/documentation/programming-guide#composite-transforms">composite transform&lt;/a>).&lt;/p>
&lt;p>Each transform takes some kind of input data and produces some output data. The
input and output data is often represented by the SDK class &lt;code>PCollection&lt;/code>.
&lt;code>PCollection&lt;/code> is a special class, provided by the Beam SDK, that you can use to
represent a dataset of virtually any size, including unbounded datasets.&lt;/p>
&lt;img src="/images/wordcount-pipeline.svg" width="800px" alt="The MinimalWordCount pipeline data flow.">
&lt;p>&lt;em>Figure 1: The MinimalWordCount pipeline data flow.&lt;/em>&lt;/p>
&lt;p>The MinimalWordCount pipeline contains five transforms:&lt;/p>
&lt;ol>
&lt;li>A text file &lt;code>Read&lt;/code> transform is applied to the &lt;code>Pipeline&lt;/code> object itself, and
produces a &lt;code>PCollection&lt;/code> as output. Each element in the output &lt;code>PCollection&lt;/code>
represents one line of text from the input file. This example uses input
data stored in a publicly accessible Google Cloud Storage bucket (&amp;ldquo;gs://&amp;rdquo;).&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://apache-beam-samples/shakespeare/*&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">input_file&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gs://apache-beam-samples/shakespeare/*&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;ol start="2">
&lt;li>This transform splits the lines in &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code>, where each element
is an individual word in Shakespeare&amp;rsquo;s collected texts.
As an alternative, it would have been possible to use a
&lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a>
transform that invokes a &lt;code>DoFn&lt;/code> (defined in-line as an anonymous class) on
each element that tokenizes the text lines into individual words. The input
for this transform is the &lt;code>PCollection&lt;/code> of text lines generated by the
previous &lt;code>TextIO.Read&lt;/code> transform. The &lt;code>ParDo&lt;/code> transform outputs a new
&lt;code>PCollection&lt;/code>, where each element represents an individual word in the text.&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ExtractWords&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FlatMapElements&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">line&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">line&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">split&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;[^\\p{L}]+&amp;#34;&lt;/span>&lt;span class="o">))))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The Flatmap transform is a simplified version of ParDo.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractWords&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">re&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">findall&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">r&lt;/span>&lt;span class="s1">&amp;#39;[A-Za-z&lt;/span>&lt;span class="se">\&amp;#39;&lt;/span>&lt;span class="s1">]+&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">words&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">wordRE&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">FindAllString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">},&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;ol start="3">
&lt;li>
&lt;p>The SDK-provided &lt;code>Count&lt;/code> transform is a generic transform that takes a
&lt;code>PCollection&lt;/code> of any type, and returns a &lt;code>PCollection&lt;/code> of key/value pairs.
Each key represents a unique element from the input collection, and each
value represents the number of times that key appeared in the input
collection.&lt;/p>
&lt;p>In this pipeline, the input for &lt;code>Count&lt;/code> is the &lt;code>PCollection&lt;/code> of individual
words generated by the previous &lt;code>ParDo&lt;/code>, and the output is a &lt;code>PCollection&lt;/code>
of key/value pairs where each key represents a unique word in the text and
the associated value is the occurrence count for each.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perElement&lt;/span>&lt;span class="o">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PerElement&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">counted&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;ol start="4">
&lt;li>
&lt;p>The next transform formats each of the key/value pairs of unique words and
occurrence counts into a printable string suitable for writing to an output
file.&lt;/p>
&lt;p>The map transform is a higher-level composite transform that encapsulates a
simple &lt;code>ParDo&lt;/code>. For each element in the input &lt;code>PCollection&lt;/code>, the map
transform applies a function that produces exactly one output element.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;FormatResults&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MapElements&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34;: &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MapTuple&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">: &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">formatted&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">c&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Sprintf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;%s: %v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">},&lt;/span> &lt;span class="nx">counted&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;ol start="5">
&lt;li>A text file write transform. This transform takes the final &lt;code>PCollection&lt;/code> of
formatted Strings as input and writes each element to an output text file.
Each element in the input &lt;code>PCollection&lt;/code> represents one line of text in the
resulting output file.&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;wordcounts&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">output_path&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;wordcounts.txt&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">formatted&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py">Note that the &lt;code>Write&lt;/code> transform produces a trivial result value of type &lt;code>PDone&lt;/code>,
which in this case is ignored.&lt;/p>
&lt;p class="language-go">Note that the &lt;code>Write&lt;/code> transform returns no PCollections.&lt;/p>
&lt;h3 id="running-the-pipeline">Running the pipeline&lt;/h3>
&lt;p class="language-java language-py">Run the pipeline by calling the &lt;code>run&lt;/code> method, which sends your pipeline to be
executed by the pipeline runner that you specified in your &lt;code>PipelineOptions&lt;/code>.&lt;/p>
&lt;p class="language-go">Run the pipeline by passing it to a runner.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>&lt;span class="n">construction&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># p.run() automatically called&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">direct&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Execute&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py">Note that the &lt;code>run&lt;/code> method is asynchronous. For a blocking execution, call the
&lt;span class="language-java">&lt;code>waitUntilFinish&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>wait_until_finish&lt;/code>&lt;/span> method on the result object
returned by the call to &lt;code>run&lt;/code>.&lt;/p>
&lt;h3 id="try-the-full-example-in-playground">Try the full example in Playground&lt;/h3>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_MinimalWordCount"
>&lt;/div>
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_WordCountMinimal"
>&lt;/div>
&lt;div
class="language-go playground-snippet"
data-sdk="go"
data-path="SDK_GO_MinimalWordCount"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_MinimalWordCount%22%2c%22sdk%22%3a%22java%22%7d%2c%7b%22path%22%3a%22SDK_PYTHON_WordCountMinimal%22%2c%22sdk%22%3a%22python%22%7d%2c%7b%22path%22%3a%22SDK_GO_MinimalWordCount%22%2c%22sdk%22%3a%22go%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="wordcount-example">WordCount example&lt;/h2>
&lt;p>This WordCount example introduces a few recommended programming practices that
can make your pipeline easier to read, write, and maintain. While not explicitly
required, they can make your pipeline&amp;rsquo;s execution more flexible, aid in testing
your pipeline, and help make your pipeline&amp;rsquo;s code reusable.&lt;/p>
&lt;p>This section assumes that you have a good understanding of the basic concepts in
building a pipeline. If you feel that you aren&amp;rsquo;t at that point yet, read the
above section, &lt;a href="#minimalwordcount-example">MinimalWordCount&lt;/a>.&lt;/p>
&lt;p>&lt;strong>To run this example in Java:&lt;/strong>&lt;/p>
&lt;p>Set up your development environment and generate the Maven archetype as
described in the &lt;a href="/get-started/quickstart-java/">Java WordCount quickstart&lt;/a>.
Then run the pipeline with one of the runners:&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--inputFile=pom.xml --output=counts&amp;#34; -Pdirect-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --inputFile=pom.xml --output=counts&amp;#34; -Pflink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --flinkMaster=&amp;lt;flink master&amp;gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
--inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts&amp;#34; -Pflink-runner
You can monitor the running job by visiting the Flink dashboard at http://&amp;lt;flink master&amp;gt;:8081&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=SparkRunner --inputFile=pom.xml --output=counts&amp;#34; -Pspark-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=DataflowRunner --gcpTempLocation=gs://YOUR_GCS_BUCKET/tmp \
--project=YOUR_PROJECT --region=GCE_REGION \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://YOUR_GCS_BUCKET/counts&amp;#34; \
-Pdataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--inputFile=pom.xml --output=counts --runner=SamzaRunner&amp;#34; -Psamza-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">$ mvn package -Pnemo-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount \
--runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">$ mvn package -P jet-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount \
--runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/pom.xml --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Java, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java">WordCount&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Python:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">python -m apache_beam.examples.wordcount --input YOUR_INPUT_FILE --output counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">python -m apache_beam.examples.wordcount --input /path/to/inputfile \
--output /path/to/write/counts \
--runner FlinkRunner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster"># Running Beam Python on a distributed Flink cluster requires additional configuration.
# See /documentation/runners/flink/ for more information.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">python -m apache_beam.examples.wordcount --input /path/to/inputfile \
--output /path/to/write/counts \
--runner SparkRunner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow"># As part of the initial setup, install Google Cloud Platform specific extra components.
pip install apache-beam[gcp]
python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://YOUR_GCS_BUCKET/counts \
--runner DataflowRunner \
--project YOUR_GCP_PROJECT \
--region YOUR_GCP_REGION \
--temp_location gs://YOUR_GCS_BUCKET/tmp/&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Python, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py">wordcount.py&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Go:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ go install github.com/apache/beam/sdks/v2/go/examples/wordcount
$ wordcount --input &amp;lt;PATH_TO_INPUT_FILE&amp;gt; --output counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ go install github.com/apache/beam/sdks/v2/go/examples/wordcount
# As part of the initial setup, for non linux users - install package unix before run
$ go get -u golang.org/x/sys/unix
$ wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://&amp;lt;your-gcs-bucket&amp;gt;/counts \
--runner dataflow \
--project your-gcp-project \
--region your-gcp-region \
--temp_location gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp/ \
--staging_location gs://&amp;lt;your-gcs-bucket&amp;gt;/binaries/ \
--worker_harness_container_image=apache/beam_go_sdk:latest&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Go, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/examples/wordcount/wordcount.go">wordcount.go&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>New Concepts:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Applying &lt;code>ParDo&lt;/code> with an explicit &lt;code>DoFn&lt;/code>&lt;/li>
&lt;li>Creating Composite Transforms&lt;/li>
&lt;li>Using Parameterizable &lt;code>PipelineOptions&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>The following sections explain these key concepts in detail, and break down the
pipeline code into smaller sections.&lt;/p>
&lt;h3 id="specifying-explicit-dofns">Specifying explicit DoFns&lt;/h3>
&lt;p class="language-java language-py">When using &lt;code>ParDo&lt;/code> transforms, you need to specify the processing operation that
gets applied to each element in the input &lt;code>PCollection&lt;/code>. This processing
operation is a subclass of the SDK class &lt;code>DoFn&lt;/code>. You can create the &lt;code>DoFn&lt;/code>
subclasses for each &lt;code>ParDo&lt;/code> inline, as an anonymous inner class instance, as is
done in the previous example (MinimalWordCount). However, it&amp;rsquo;s often a good
idea to define the &lt;code>DoFn&lt;/code> at the global level, which makes it easier to unit
test and can make the &lt;code>ParDo&lt;/code> code more readable.&lt;/p>
&lt;p class="language-go">When using &lt;code>ParDo&lt;/code> transforms, you need to specify the processing operation that
gets applied to each element in the input &lt;code>PCollection&lt;/code>. This processing
operation is either a named function or a struct with specially-named methods. You
can use anonymous functions (but not closures). However, it&amp;rsquo;s often a good
idea to define the &lt;code>DoFn&lt;/code> at the global level, which makes it easier to unit
test and can make the &lt;code>ParDo&lt;/code> code more readable.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// In this example, ExtractWordsFn is a DoFn that is defined as a static class:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ExtractWordsFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># In this example, the DoFns are defined as classes:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FormatAsTextFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">: &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">formatted&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">counts&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FormatAsTextFn&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// In this example, extractFn is a DoFn that is defined as a function:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">extractFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="creating-composite-transforms">Creating composite transforms&lt;/h3>
&lt;p class="language-java language-py">If you have a processing operation that consists of multiple transforms or
&lt;code>ParDo&lt;/code> steps, you can create it as a subclass of &lt;code>PTransform&lt;/code>. Creating a
&lt;code>PTransform&lt;/code> subclass allows you to encapsulate complex transforms, can make
your pipeline&amp;rsquo;s structure more clear and modular, and makes unit testing easier.&lt;/p>
&lt;p class="language-go">If you have a processing operation that consists of multiple transforms or
&lt;code>ParDo&lt;/code> steps, you can use a normal Go function to encapsulate them. You can
furthermore use a named subscope to group them as a composite transform visible
for monitoring.&lt;/p>
&lt;p class="language-java language-py">In this example, two transforms are encapsulated as the &lt;code>PTransform&lt;/code> subclass
&lt;code>CountWords&lt;/code>. &lt;code>CountWords&lt;/code> contains the &lt;code>ParDo&lt;/code> that runs &lt;code>ExtractWordsFn&lt;/code> and
the SDK-provided &lt;code>Count&lt;/code> transform.&lt;/p>
&lt;p class="language-go">In this example, two transforms are encapsulated as a &lt;code>CountWords&lt;/code> function.&lt;/p>
&lt;p>When &lt;code>CountWords&lt;/code> is defined, we specify its ultimate input and output; the
input is the &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code> for the extraction operation, and the output
is the &lt;code>PCollection&amp;lt;KV&amp;lt;String, Long&amp;gt;&amp;gt;&lt;/code> produced by the count operation.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">CountWords&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">lines&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert lines of text into individual words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">lines&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ExtractWordsFn&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Count the number of times each word occurs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">wordCounts&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perElement&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">wordCounts&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">throws&lt;/span> &lt;span class="n">IOException&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">CountWords&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@beam.ptransform_fn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Convert lines of text into individual words.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractWords&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">re&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">findall&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">r&lt;/span>&lt;span class="s1">&amp;#39;[A-Za-z&lt;/span>&lt;span class="se">\&amp;#39;&lt;/span>&lt;span class="s1">]+&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Count the number of times each word occurs.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PerElement&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">counts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">lines&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">CountWords&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scope&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;CountWords&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert lines of text into individual words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">col&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">extractFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Count the number of times each word occurs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">col&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="using-parameterizable-pipelineoptions">Using parameterizable PipelineOptions&lt;/h3>
&lt;p>You can hard-code various execution options when you run your pipeline. However,
the more common way is to define your own configuration options via command-line
argument parsing. Defining your configuration options via the command-line makes
the code more easily portable across different runners.&lt;/p>
&lt;p class="language-java language-py">Add arguments to be processed by the command-line parser, and specify default
values for them. You can then access the options values in your pipeline code.&lt;/p>
&lt;p class="language-go">You can use the standard &lt;code>flag&lt;/code> package for this purpose.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">WordCountOptions&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Description&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Path of the file to read from&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Default.String&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://dataflow-samples/shakespeare/kinglear.txt&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getInputFile&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setInputFile&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">WordCountOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">WordCountOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">argparse&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">parser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">argparse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ArgumentParser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--input-file&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://dataflow-samples/shakespeare/kinglear.txt&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The file path for the input text to process.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--output-path&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">required&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The path prefix for output files.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">args&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">beam_args&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parse_known_args&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam_options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam_args&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam_options&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input_file&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;input&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gs://apache-beam-samples/shakespeare/kinglear.txt&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;File(s) to read.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">p&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipeline&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Root&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="try-the-full-example-in-playground-1">Try the full example in Playground&lt;/h3>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_WordCount"
>&lt;/div>
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_WordCount"
>&lt;/div>
&lt;div
class="language-go playground-snippet"
data-sdk="go"
data-path="SDK_GO_WordCount"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_WordCount%22%2c%22sdk%22%3a%22java%22%7d%2c%7b%22path%22%3a%22SDK_PYTHON_WordCount%22%2c%22sdk%22%3a%22python%22%7d%2c%7b%22path%22%3a%22SDK_GO_WordCount%22%2c%22sdk%22%3a%22go%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="debuggingwordcount-example">DebuggingWordCount example&lt;/h2>
&lt;p>The DebuggingWordCount example demonstrates some best practices for
instrumenting your pipeline code.&lt;/p>
&lt;p>&lt;strong>To run this example in Java:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--output=counts&amp;#34; -Pdirect-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --output=counts&amp;#34; -Pflink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --flinkMaster=&amp;lt;flink master&amp;gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
--output=/tmp/counts&amp;#34; -Pflink-runner
You can monitor the running job by visiting the Flink dashboard at http://&amp;lt;flink master&amp;gt;:8081&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--runner=SparkRunner --output=counts&amp;#34; -Pspark-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--runner=DataflowRunner --gcpTempLocation=gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp \
--project=YOUR_PROJECT --region=GCE_REGION \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&amp;lt;your-gcs-bucket&amp;gt;/counts&amp;#34; \
-Pdataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.DebuggingWordCount \
-Dexec.args=&amp;#34;--runner=SamzaRunner --output=counts&amp;#34; -Psamza-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">$ mvn package -Pnemo-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.DebuggingWordCount \
--runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">$ mvn package -P jet-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.DebuggingWordCount \
--runner=JetRunner --jetLocalMode=3 --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Java, see
&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java">DebuggingWordCount&lt;/a>.&lt;/p>
&lt;p>&lt;strong>To run this example in Python:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">python -m apache_beam.examples.wordcount_debugging --input YOUR_INPUT_FILE --output counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow"># As part of the initial setup, install Google Cloud Platform specific extra components.
pip install apache-beam[gcp]
python -m apache_beam.examples.wordcount_debugging --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://YOUR_GCS_BUCKET/counts \
--runner DataflowRunner \
--project YOUR_GCP_PROJECT \
--temp_location gs://YOUR_GCS_BUCKET/tmp/&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Python, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_debugging.py">wordcount_debugging.py&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Go:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ go install github.com/apache/beam/sdks/v2/go/examples/debugging_wordcount
$ debugging_wordcount --input &amp;lt;PATH_TO_INPUT_FILE&amp;gt; --output counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ go install github.com/apache/beam/sdks/v2/go/examples/debugging_wordcount
# As part of the initial setup, for non linux users - install package unix before run
$ go get -u golang.org/x/sys/unix
$ debugging_wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://&amp;lt;your-gcs-bucket&amp;gt;/counts \
--runner dataflow \
--project your-gcp-project \
--region your-gcp-region \
--temp_location gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp/ \
--staging_location gs://&amp;lt;your-gcs-bucket&amp;gt;/binaries/ \
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Go, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/examples/debugging_wordcount/debugging_wordcount.go">debugging_wordcount.go&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>New Concepts:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Logging&lt;/li>
&lt;li>Testing your Pipeline via &lt;code>PAssert&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>The following sections explain these key concepts in detail, and break down the
pipeline code into smaller sections.&lt;/p>
&lt;h3 id="logging">Logging&lt;/h3>
&lt;p>Each runner may choose to handle logs in its own way.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This example uses .trace and .debug:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">DebuggingWordCount&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">FilterTextFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(...)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">LOG&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">debug&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Matched: &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">LOG&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">trace&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Did not match: &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># [START example_wordcount_debugging_aggregators]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">logging&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FilterTextFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;A DoFn that filters for a specific key based on a regular expression.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pattern&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pattern&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pattern&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># A custom metric can track values in your pipeline as it runs. Create&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># custom metrics matched_word and unmatched_words.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">matched_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">counter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="vm">__class__&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;matched_words&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">umatched_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">counter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="vm">__class__&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;umatched_words&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">re&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="k">match&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Log at INFO level each element we match. When executing this pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># using the Dataflow service, these log lines will appear in the Cloud&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Logging UI.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Matched &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Add 1 to the custom metric counter matched_words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">matched_words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">inc&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Log at the &amp;#34;DEBUG&amp;#34; level each element that is not matched. Different&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># log levels can be used to control the verbosity of logging providing&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># an effective mechanism to filter less important information. Note&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># currently only &amp;#34;INFO&amp;#34; and higher level logs are emitted to the Cloud&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Logger. This log message will not be visible in the Cloud Logger.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">debug&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Did not match &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Add 1 to the custom metric counter umatched_words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">umatched_words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">inc&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">filterFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">f&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">filterFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">f&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">re&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MatchString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Log at the &amp;#34;INFO&amp;#34; level each element that we match.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">log&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Infof&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Matched: %v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Log at the &amp;#34;DEBUG&amp;#34; level each element that is not matched.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">log&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Debugf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Did not match: %v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="direct-runner">Direct Runner&lt;/h4>
&lt;p>When executing your pipeline with the &lt;code>DirectRunner&lt;/code>, you can print log
messages directly to your local console. &lt;span class="language-java">If you use
the Beam SDK for Java, you must add &lt;code>Slf4j&lt;/code> to your class path.&lt;/span>&lt;/p>
&lt;h4 id="cloud-dataflow-runner">Cloud Dataflow Runner&lt;/h4>
&lt;p>When executing your pipeline with the &lt;code>DataflowRunner&lt;/code>, you can use Stackdriver
Logging. Stackdriver Logging aggregates the logs from all of your Cloud Dataflow
job&amp;rsquo;s workers to a single location in the Google Cloud Platform Console. You can
use Stackdriver Logging to search and access the logs from all of the workers
that Cloud Dataflow has spun up to complete your job. Logging statements in your
pipeline&amp;rsquo;s &lt;code>DoFn&lt;/code> instances will appear in Stackdriver Logging as your pipeline
runs.&lt;/p>
&lt;p class="language-java language-py">You can also control the worker log levels. Cloud Dataflow workers that execute
user code are configured to log to Stackdriver Logging by default at &amp;ldquo;INFO&amp;rdquo; log
level and higher. You can override log levels for specific logging namespaces by
specifying: &lt;code>--workerLogLevelOverrides={&amp;quot;Name1&amp;quot;:&amp;quot;Level1&amp;quot;,&amp;quot;Name2&amp;quot;:&amp;quot;Level2&amp;quot;,...}&lt;/code>.
For example, by specifying &lt;code>--workerLogLevelOverrides={&amp;quot;org.apache.beam.examples&amp;quot;:&amp;quot;DEBUG&amp;quot;}&lt;/code>
when executing a pipeline using the Cloud Dataflow service, Stackdriver Logging
will contain only &amp;ldquo;DEBUG&amp;rdquo; or higher level logs for the package in addition to
the default &amp;ldquo;INFO&amp;rdquo; or higher level logs.&lt;/p>
&lt;p class="language-java language-py">The default Cloud Dataflow worker logging configuration can be overridden by
specifying &lt;code>--defaultWorkerLogLevel=&amp;lt;one of TRACE, DEBUG, INFO, WARN, ERROR&amp;gt;&lt;/code>.
For example, by specifying &lt;code>--defaultWorkerLogLevel=DEBUG&lt;/code> when executing a
pipeline with the Cloud Dataflow service, Cloud Logging will contain all &amp;ldquo;DEBUG&amp;rdquo;
or higher level logs. Note that changing the default worker log level to TRACE
or DEBUG significantly increases the amount of logs output.&lt;/p>
&lt;h4 id="apache-spark-runner">Apache Spark Runner&lt;/h4>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> This section is yet to be added. There is an open issue for this
(&lt;a href="https://github.com/apache/beam/issues/18076">Issue 18076&lt;/a>).&lt;/p>
&lt;/blockquote>
&lt;h4 id="apache-flink-runner">Apache Flink Runner&lt;/h4>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> This section is yet to be added. There is an open issue for this
(&lt;a href="https://github.com/apache/beam/issues/18075">Issue 18075&lt;/a>).&lt;/p>
&lt;/blockquote>
&lt;h4 id="apache-nemo-runner">Apache Nemo Runner&lt;/h4>
&lt;p>When executing your pipeline with the &lt;code>NemoRunner&lt;/code>, most log messages are printed
directly to your local console. You should add &lt;code>Slf4j&lt;/code> to your class path to make
full use of the logs. In order to observe the logs on each of the driver and the
executor sides, you should observe the folders created by Apache REEF. For example,
when running your pipeline through the local runtime, a folder called &lt;code>REEF_LOCAL_RUNTIME&lt;/code>
will be created on your work directory, and the logs and the metric information can
all be found under the directory.&lt;/p>
&lt;h3 id="testing-your-pipeline-with-asserts">Testing your pipeline with asserts&lt;/h3>
&lt;p class="language-java language-py">&lt;span class="language-java">&lt;code>PAssert&lt;/code>&lt;/span>&lt;span class="language-py">&lt;code>assert_that&lt;/code>&lt;/span>
is a set of convenient PTransforms in the style of Hamcrest&amp;rsquo;s collection
matchers that can be used when writing pipeline level tests to validate the
contents of PCollections. Asserts are best used in unit tests with small datasets.&lt;/p>
&lt;p class="language-go">The &lt;code>passert&lt;/code> package contains convenient PTransforms that can be used when
writing pipeline level tests to validate the contents of PCollections. Asserts
are best used in unit tests with small datasets.&lt;/p>
&lt;p class="language-java">The following example verifies that the set of filtered words matches our
expected counts. The assert does not produce any output, and the pipeline only
succeeds if all of the expectations are met.&lt;/p>
&lt;p class="language-py language-go">The following example verifies that two collections contain the same values. The
assert does not produce any output, and the pipeline only succeeds if all of the
expectations are met.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">expectedResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Flourish&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">3L&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;stomach&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">1L&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PAssert&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">that&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">filteredWords&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">containsInAnyOrder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">expectedResults&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.testing.util&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">assert_that&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.testing.util&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">equal_to&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">TestPipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert_that&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">]),&lt;/span> &lt;span class="n">equal_to&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">]))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">passert&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Equals&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">formatted&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Flourish: 3&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;stomach: 1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">See &lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/DebuggingWordCountTest.java">DebuggingWordCountTest&lt;/a>
for an example unit test.&lt;/p>
&lt;h3 id="try-the-full-example-in-playground-2">Try the full example in Playground&lt;/h3>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_DebuggingWordCount"
>&lt;/div>
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_WordCountDebugging"
>&lt;/div>
&lt;div
class="language-go playground-snippet"
data-sdk="go"
data-path="SDK_GO_DebuggingWordCount"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_DebuggingWordCount%22%2c%22sdk%22%3a%22java%22%7d%2c%7b%22path%22%3a%22SDK_PYTHON_WordCountDebugging%22%2c%22sdk%22%3a%22python%22%7d%2c%7b%22path%22%3a%22SDK_GO_DebuggingWordCount%22%2c%22sdk%22%3a%22go%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="windowedwordcount-example">WindowedWordCount example&lt;/h2>
&lt;p>The WindowedWordCount example counts words in text just as the previous
examples did, but introduces several advanced concepts.&lt;/p>
&lt;p>&lt;strong>New Concepts:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Unbounded and bounded datasets&lt;/li>
&lt;li>Adding timestamps to data&lt;/li>
&lt;li>Windowing&lt;/li>
&lt;li>Reusing PTransforms over windowed PCollections&lt;/li>
&lt;/ul>
&lt;p>The following sections explain these key concepts in detail, and break down the
pipeline code into smaller sections.&lt;/p>
&lt;p>&lt;strong>To run this example in Java:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--inputFile=pom.xml --output=counts&amp;#34; -Pdirect-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --inputFile=pom.xml --output=counts&amp;#34; -Pflink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --flinkMaster=&amp;lt;flink master&amp;gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
--inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts&amp;#34; -Pflink-runner
You can monitor the running job by visiting the Flink dashboard at http://&amp;lt;flink master&amp;gt;:8081&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--runner=SparkRunner --inputFile=pom.xml --output=counts&amp;#34; -Pspark-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--runner=DataflowRunner --gcpTempLocation=gs://YOUR_GCS_BUCKET/tmp \
--project=YOUR_PROJECT --region=GCE_REGION \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://YOUR_GCS_BUCKET/counts&amp;#34; \
-Pdataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WindowedWordCount \
-Dexec.args=&amp;#34;--runner=SamzaRunner --inputFile=pom.xml --output=counts&amp;#34; -Psamza-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">$ mvn package -Pnemo-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WindowedWordCount \
--runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">$ mvn package -P jet-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WindowedWordCount \
--runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/pom.xml --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Java, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java">WindowedWordCount&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Python:&lt;/strong>&lt;/p>
&lt;p>This pipeline writes its results to a BigQuery table &lt;code>--output_table&lt;/code>
parameter. using the format &lt;code>PROJECT:DATASET.TABLE&lt;/code> or
&lt;code>DATASET.TABLE&lt;/code>.&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">python -m apache_beam.examples.windowed_wordcount --input YOUR_INPUT_FILE --output_table PROJECT:DATASET.TABLE&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow"># As part of the initial setup, install Google Cloud Platform specific extra components.
pip install apache-beam[gcp]
python -m apache_beam.examples.windowed_wordcount --input YOUR_INPUT_FILE \
--output_table PROJECT:DATASET.TABLE \
--runner DataflowRunner \
--project YOUR_GCP_PROJECT \
--temp_location gs://YOUR_GCS_BUCKET/tmp/&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Python, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/windowed_wordcount.py">windowed_wordcount.py&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Go:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">$ go install github.com/apache/beam/sdks/v2/go/examples/windowed_wordcount
$ windowed_wordcount --input &amp;lt;PATH_TO_INPUT_FILE&amp;gt; --output counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">$ go install github.com/apache/beam/sdks/v2/go/examples/windowed_wordcount
# As part of the initial setup, for non linux users - install package unix before run
$ go get -u golang.org/x/sys/unix
$ windowed_wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://&amp;lt;your-gcs-bucket&amp;gt;/counts \
--runner dataflow \
--project your-gcp-project \
--temp_location gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp/ \
--staging_location gs://&amp;lt;your-gcs-bucket&amp;gt;/binaries/ \
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Go SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Go, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/examples/windowed_wordcount/windowed_wordcount.go">windowed_wordcount.go&lt;/a>.&lt;/strong>&lt;/p>
&lt;h3 id="unbounded-and-bounded-datasets">Unbounded and bounded datasets&lt;/h3>
&lt;p>Beam allows you to create a single pipeline that can handle both bounded and
unbounded datasets. If your dataset has a fixed number of elements, it is a bounded
dataset and all of the data can be processed together. For bounded datasets,
the question to ask is &amp;ldquo;Do I have all of the data?&amp;rdquo; If data continuously
arrives (such as an endless stream of game scores in the
&lt;a href="/get-started/mobile-gaming-example/">Mobile gaming example&lt;/a>,
it is an unbounded dataset. An unbounded dataset is never available for
processing at any one time, so the data must be processed using a streaming
pipeline that runs continuously. The dataset will only be complete up to a
certain point, so the question to ask is &amp;ldquo;Up until what point do I have all of
the data?&amp;rdquo; Beam uses &lt;a href="/documentation/programming-guide/#windowing">windowing&lt;/a>
to divide a continuously updating dataset into logical windows of finite size.
If your input is unbounded, you must use a runner that supports streaming.&lt;/p>
&lt;p>If your pipeline&amp;rsquo;s input is bounded, then all downstream PCollections will also be
bounded. Similarly, if the input is unbounded, then all downstream PCollections
of the pipeline will be unbounded, though separate branches may be independently
bounded.&lt;/p>
&lt;p>Recall that the input for this example is a set of Shakespeare&amp;rsquo;s texts, which is
a finite set of data. Therefore, this example reads bounded data from a text
file:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">throws&lt;/span> &lt;span class="n">IOException&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Options&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">input&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInputFile&lt;/span>&lt;span class="o">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">arvg&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">None&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">argparse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ArgumentParser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;--input-file&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dest&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;input_file&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;/Users/home/words-example.txt&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">known_args&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pipeline_args&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parse_known_args&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">argv&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline_options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pipeline_args&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">pipeline_options&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;read&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">known_args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input_file&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">p&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipeline&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Root&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="adding-timestamps-to-data">Adding timestamps to data&lt;/h3>
&lt;p>Each element in a &lt;code>PCollection&lt;/code> has an associated &lt;a href="/documentation/programming-guide#element-timestamps">timestamp&lt;/a>.
The timestamp for each element is initially assigned by the source that creates
the &lt;code>PCollection&lt;/code>. Some sources that create unbounded PCollections can assign
each new element a timestamp that corresponds to when the element was read or
added. You can manually assign or adjust timestamps with a &lt;code>DoFn&lt;/code>; however, you
can only move timestamps forward in time.&lt;/p>
&lt;p>In this example the input is bounded. For the purpose of the example, the &lt;code>DoFn&lt;/code>
method named &lt;code>AddTimestampsFn&lt;/code> (invoked by &lt;code>ParDo&lt;/code>) will set a timestamp for
each element in the &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">AddTimestampFn&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">AddTimestampFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">min_timestamp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">max_timestamp&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">timestampedLines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">addTimestampFn&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Min&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">mtime&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">()},&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Below is the code for &lt;code>AddTimestampFn&lt;/code>, a &lt;code>DoFn&lt;/code> invoked by &lt;code>ParDo&lt;/code>, that sets
the data element of the timestamp given the element itself. For example, if the
elements were log lines, this &lt;code>ParDo&lt;/code> could parse the time out of the log string
and set it as the element&amp;rsquo;s timestamp. There are no timestamps inherent in the
works of Shakespeare, so in this case we&amp;rsquo;ve made up random timestamps just to
illustrate the concept. Each line of the input text will get a random associated
timestamp sometime in a 2-hour period.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">AddTimestampFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AddTimestampFn&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Instant&lt;/span> &lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">minTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">maxTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Instant&lt;/span> &lt;span class="n">randomTimestamp&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ThreadLocalRandom&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">current&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">nextLong&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Concept #2: Set the data element with that timestamp.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">outputWithTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">randomTimestamp&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">AddTimestampFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">min_timestamp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">max_timestamp&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">min_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">min_timestamp&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">max_timestamp&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampedValue&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">random&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">randint&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">min_timestamp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_timestamp&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">addTimestampFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Min&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span> &lt;span class="s">`json:&amp;#34;min&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">f&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">addTimestampFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">x&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">X&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">X&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">timestamp&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">f&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Min&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rand&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Int63n&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Nanoseconds&lt;/span>&lt;span class="p">())))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">timestamp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">x&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Note that the use of the &lt;code>beam.X&lt;/code> &amp;ldquo;type variable&amp;rdquo; allows the transform to be
used for any type.&lt;/p>
&lt;h3 id="windowing">Windowing&lt;/h3>
&lt;p>Beam uses a concept called &lt;strong>Windowing&lt;/strong> to subdivide a &lt;code>PCollection&lt;/code> into
bounded sets of elements. PTransforms that aggregate multiple elements process
each &lt;code>PCollection&lt;/code> as a succession of multiple, finite windows, even though the
entire collection itself may be of infinite size (unbounded).&lt;/p>
&lt;p>The WindowedWordCount example applies fixed-time windowing, wherein each
window represents a fixed time interval. The fixed window size for this example
defaults to 1 minute (you can change this with a command-line option).&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">windowedWords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getWindowSize&lt;/span>&lt;span class="o">()))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">windowed_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">input&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">window_size_minutes&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">windowedLines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">timestampedLines&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="reusing-ptransforms-over-windowed-pcollections">Reusing PTransforms over windowed PCollections&lt;/h3>
&lt;p>You can reuse existing PTransforms that were created for manipulating simple
PCollections over windowed PCollections as well.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">wordCounts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">windowedWords&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">WordCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">CountWords&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_counts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">windowed_words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">CountWords&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">counted&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">wordcount&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">windowedLines&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-go">&lt;h3 id="try-the-full-example-in-playground">Try the full example in Playground&lt;/h3>
&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_WindowedWordCount"
>&lt;/div>
&lt;div
class="language-go playground-snippet"
data-sdk="go"
data-path="SDK_GO_WindowedWordCount"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_WindowedWordCount%22%2c%22sdk%22%3a%22java%22%7d%2c%7b%22path%22%3a%22SDK_GO_WindowedWordCount%22%2c%22sdk%22%3a%22go%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="streamingwordcount-example">StreamingWordCount example&lt;/h2>
&lt;p>The StreamingWordCount example is a streaming pipeline that reads Pub/Sub
messages from a Pub/Sub subscription or topic, and performs a frequency count on
the words in each message. Similar to WindowedWordCount, this example applies
fixed-time windowing, wherein each window represents a fixed time interval. The
fixed window size for this example is 15 seconds. The pipeline outputs the
frequency count of the words seen in each 15 second window.&lt;/p>
&lt;p>&lt;strong>New Concepts:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Reading an unbounded dataset&lt;/li>
&lt;li>Writing unbounded results&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>To run this example in Java:&lt;/strong>&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> StreamingWordCount is not yet available for the Java SDK.&lt;/p>
&lt;/blockquote>
&lt;p>&lt;strong>To run this example in Python:&lt;/strong>&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">python -m apache_beam.examples.streaming_wordcount \
--input_topic &amp;#34;projects/YOUR_PUBSUB_PROJECT_NAME/topics/YOUR_INPUT_TOPIC&amp;#34; \
--output_topic &amp;#34;projects/YOUR_PUBSUB_PROJECT_NAME/topics/YOUR_OUTPUT_TOPIC&amp;#34; \
--streaming&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow"># As part of the initial setup, install Google Cloud Platform specific extra components.
pip install apache-beam[gcp]
python -m apache_beam.examples.streaming_wordcount \
--runner DataflowRunner \
--project YOUR_GCP_PROJECT \
--region YOUR_GCP_REGION \
--temp_location gs://YOUR_GCS_BUCKET/tmp/ \
--input_topic &amp;#34;projects/YOUR_PUBSUB_PROJECT_NAME/topics/YOUR_INPUT_TOPIC&amp;#34; \
--output_topic &amp;#34;projects/YOUR_PUBSUB_PROJECT_NAME/topics/YOUR_OUTPUT_TOPIC&amp;#34; \
--streaming&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To view the full code in Python, see
&lt;strong>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py">streaming_wordcount.py&lt;/a>.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To run this example in Go:&lt;/strong>&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> StreamingWordCount is not yet available for the Go SDK. There is an open issue for this
(&lt;a href="https://github.com/apache/beam/issues/18879">Issue 18879&lt;/a>).&lt;/p>
&lt;/blockquote>
&lt;h3 id="reading-an-unbounded-dataset">Reading an unbounded dataset&lt;/h3>
&lt;p>This example uses an unbounded dataset as input. The code reads Pub/Sub
messages from a Pub/Sub subscription or topic using
&lt;a href="https://beam.apache.org/releases/pydoc/2.55.1/apache_beam.io.gcp.pubsub.html#apache_beam.io.gcp.pubsub.ReadFromPubSub">&lt;code>beam.io.ReadFromPubSub&lt;/code>&lt;/a>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This example is not currently available for the Beam SDK for Java.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Read from Pub/Sub into a PCollection.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">known_args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input_subscription&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromPubSub&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">subscription&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">known_args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input_subscription&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromPubSub&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">topic&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">known_args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">input_topic&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">data&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;DecodeString&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">d&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">d&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;utf-8&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This example is not currently available for the Beam SDK for Go.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="writing-unbounded-results">Writing unbounded results&lt;/h3>
&lt;p>When the input is unbounded, the same is true of the output &lt;code>PCollection&lt;/code>. As
such, you must make sure to choose an appropriate I/O for the results. Some I/Os
support only bounded output, while others support both bounded and unbounded
outputs.&lt;/p>
&lt;p>This example uses an unbounded &lt;code>PCollection&lt;/code> and streams the results to
Google Pub/Sub. The code formats the results and writes them to a Pub/Sub topic
using &lt;a href="https://beam.apache.org/releases/pydoc/2.55.1/apache_beam.io.gcp.pubsub.html#apache_beam.io.gcp.pubsub.WriteToPubSub">&lt;code>beam.io.WriteToPubSub&lt;/code>&lt;/a>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This example is not currently available for the Beam SDK for Java.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Write to Pub/Sub&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">output&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;EncodeString&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">encode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;utf-8&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToPubSub&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">known_args&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">output_topic&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This example is not currently available for the Beam SDK for Go.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Walk through the Mobile Gaming examples in the &lt;a href="/get-started/mobile-gaming-example">Mobile Gaming Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Getting started from Apache Spark</title><link>/get-started/from-spark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/from-spark/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="getting-started-from-apache-spark">Getting started from Apache Spark&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;p>If you already know &lt;a href="https://spark.apache.org/">&lt;em>Apache Spark&lt;/em>&lt;/a>,
using Beam should be easy.
The basic concepts are the same, and the APIs are similar as well.&lt;/p>
&lt;p>Spark stores data &lt;em>Spark DataFrames&lt;/em> for structured data,
and in &lt;em>Resilient Distributed Datasets&lt;/em> (RDD) for unstructured data.
We are using RDDs for this guide.&lt;/p>
&lt;p>A Spark RDD represents a collection of elements,
while in Beam it&amp;rsquo;s called a &lt;em>Parallel Collection&lt;/em> (PCollection).
A PCollection in Beam does &lt;em>not&lt;/em> have any ordering guarantees.&lt;/p>
&lt;p>Likewise, a transform in Beam is called a &lt;em>Parallel Transform&lt;/em> (PTransform).&lt;/p>
&lt;p>Here are some examples of common operations and their equivalent between PySpark and Beam.&lt;/p>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Here&amp;rsquo;s a simple example of a PySpark pipeline that takes the numbers from one to four,
multiplies them by two, adds all the values together, and prints the result.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">pyspark&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">sc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pyspark&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SparkContext&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parallelize&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">reduce&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>In Beam you pipe your data through the pipeline using the
&lt;em>pipe operator&lt;/em> &lt;code>|&lt;/code> like &lt;code>data | beam.Map(...)&lt;/code> instead of chaining
methods like &lt;code>data.map(...)&lt;/code>, but they&amp;rsquo;re doing the same thing.&lt;/p>
&lt;p>Here&amp;rsquo;s what an equivalent pipeline looks like in Beam.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;blockquote>
&lt;p>ℹ️ Note that we called &lt;code>print&lt;/code> inside a &lt;code>Map&lt;/code> transform.
That&amp;rsquo;s because we can only access the elements of a PCollection
from within a PTransform.
To inspect the data locally, you can use the &lt;a href="https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#creating_your_pipeline">InteractiveRunner&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;p>Another thing to note is that Beam pipelines are constructed lazily.
This means that when you pipe &lt;code>|&lt;/code> data you&amp;rsquo;re only declaring the
transformations and the order you want them to happen,
but the actual computation doesn&amp;rsquo;t happen.
The pipeline is run after the &lt;code>with beam.Pipeline() as pipeline&lt;/code> context has
closed.&lt;/p>
&lt;blockquote>
&lt;p>ℹ️ When the &lt;code>with beam.Pipeline() as pipeline&lt;/code> context closes,
it implicitly calls &lt;code>pipeline.run()&lt;/code> which triggers the computation to happen.&lt;/p>
&lt;/blockquote>
&lt;p>The pipeline is then sent to your
&lt;a href="/documentation/runners/capability-matrix/">runner of choice&lt;/a>
and it processes the data.&lt;/p>
&lt;blockquote>
&lt;p>ℹ️ The pipeline can run locally with the &lt;em>DirectRunner&lt;/em>,
or in a distributed runner such as Flink, Spark, or Dataflow.
The Spark runner is not related to PySpark.&lt;/p>
&lt;/blockquote>
&lt;p>A label can optionally be added to a transform using the
&lt;em>right shift operator&lt;/em> &lt;code>&amp;gt;&amp;gt;&lt;/code> like &lt;code>data | 'My description' &amp;gt;&amp;gt; beam.Map(...)&lt;/code>.
This serves both as comments and makes your pipeline easier to debug.&lt;/p>
&lt;p>This is how the pipeline looks after adding labels.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Create numbers&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Multiply by two&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Sum everything&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Print results&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="setup">Setup&lt;/h2>
&lt;p>Here&amp;rsquo;s a comparison on how to get started both in PySpark and Beam.&lt;/p>
&lt;div class="table-container-wrapper">
&lt;div class="table-wrapper">&lt;table style="width:100%" class="table-wrapper--equal-p">
&lt;tr>
&lt;th style="width:20%">&lt;/th>
&lt;th style="width:40%">PySpark&lt;/th>
&lt;th style="width:40%">Beam&lt;/th>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Install&lt;/b>&lt;/td>
&lt;td>&lt;code>$ pip install pyspark&lt;/code>&lt;/td>
&lt;td>&lt;code>$ pip install apache-beam&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Imports&lt;/b>&lt;/td>
&lt;td>&lt;code>import pyspark&lt;/code>&lt;/td>
&lt;td>&lt;code>import apache_beam as beam&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Creating a&lt;br>local pipeline&lt;/b>&lt;/td>
&lt;td>
&lt;code>sc = pyspark.SparkContext() as sc:&lt;/code>&lt;br>
&lt;code># Your pipeline code here.&lt;/code>
&lt;/td>
&lt;td>
&lt;code>with beam.Pipeline() as pipeline:&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# Your pipeline code here.&lt;/code>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Creating values&lt;/b>&lt;/td>
&lt;td>&lt;code>values = sc.parallelize([1, 2, 3, 4])&lt;/code>&lt;/td>
&lt;td>&lt;code>values = pipeline | beam.Create([1, 2, 3, 4])&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Creating&lt;br>key-value pairs&lt;/b>&lt;/td>
&lt;td>
&lt;code>pairs = sc.parallelize([&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key1', 'value1'),&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key2', 'value2'),&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key3', 'value3'),&lt;/code>&lt;br>
&lt;code>])&lt;/code>
&lt;/td>
&lt;td>
&lt;code>pairs = pipeline | beam.Create([&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key1', 'value1'),&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key2', 'value2'),&lt;/code>&lt;br>
&lt;code>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;('key3', 'value3'),&lt;/code>&lt;br>
&lt;code>])&lt;/code>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>Running a&lt;br>local pipeline&lt;/b>&lt;/td>
&lt;td>&lt;code>$ spark-submit spark_pipeline.py&lt;/code>&lt;/td>
&lt;td>&lt;code>$ python beam_pipeline.py&lt;/code>&lt;/td>
&lt;/tr>
&lt;/table>
&lt;/div>
&lt;/div>
&lt;h2 id="transforms">Transforms&lt;/h2>
&lt;p>Here are the equivalents of some common transforms in both PySpark and Beam.&lt;/p>
&lt;div class="table-container-wrapper">
&lt;div class="table-wrapper">&lt;table style="width:100%" class="table-wrapper--equal-p">
&lt;tr>
&lt;th style="width:20%">&lt;/th>
&lt;th style="width:40%">PySpark&lt;/th>
&lt;th style="width:40%">Beam&lt;/th>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/elementwise/map/">Map&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.map(lambda x: x * 2)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.Map(lambda x: x * 2)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/elementwise/filter/">Filter&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.filter(lambda x: x % 2 == 0)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.Filter(lambda x: x % 2 == 0)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/elementwise/flatmap/">FlatMap&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.flatMap(lambda x: range(x))&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.FlatMap(lambda x: range(x))&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/groupbykey/">Group by key&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>pairs.groupByKey()&lt;/code>&lt;/td>
&lt;td>&lt;code>pairs | beam.GroupByKey()&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/combineglobally/">Reduce&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.reduce(lambda x, y: x+y)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.CombineGlobally(sum)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/combineperkey/">Reduce by key&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>pairs.reduceByKey(lambda x, y: x+y)&lt;/code>&lt;/td>
&lt;td>&lt;code>pairs | beam.CombinePerKey(sum)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/distinct/">Distinct&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.distinct()&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.Distinct()&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/count/">Count&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.count()&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.combiners.Count.Globally()&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/count/">Count by key&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>pairs.countByKey()&lt;/code>&lt;/td>
&lt;td>&lt;code>pairs | beam.combiners.Count.PerKey()&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/top/">Take smallest&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.takeOrdered(3)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.combiners.Top.Smallest(3)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/top/">Take largest&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.takeOrdered(3, lambda x: -x)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.combiners.Top.Largest(3)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/sample/">Random sample&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.takeSample(False, 3)&lt;/code>&lt;/td>
&lt;td>&lt;code>values | beam.combiners.Sample.FixedSizeGlobally(3)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/other/flatten/">Union&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>values.union(otherValues)&lt;/code>&lt;/td>
&lt;td>&lt;code>(values, otherValues) | beam.Flatten()&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;b>&lt;a href="/documentation/transforms/python/aggregation/cogroupbykey/">Co-group&lt;/a>&lt;/b>&lt;/td>
&lt;td>&lt;code>pairs.cogroup(otherPairs)&lt;/code>&lt;/td>
&lt;td>&lt;code>{'Xs': pairs, 'Ys': otherPairs} | beam.CoGroupByKey()&lt;/code>&lt;/td>
&lt;/tr>
&lt;/table>
&lt;/div>
&lt;/div>
&lt;blockquote>
&lt;p>ℹ️ To learn more about the transforms available in Beam, check the
&lt;a href="/documentation/transforms/python/overview">Python transform gallery&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;h2 id="using-calculated-values">Using calculated values&lt;/h2>
&lt;p>Since we are working in potentially distributed environments,
we can&amp;rsquo;t guarantee that the results we&amp;rsquo;ve calculated are available at any given machine.&lt;/p>
&lt;p>In PySpark, we can get a result from a collection of elements (RDD) by using
&lt;code>data.collect()&lt;/code>, or other aggregations such as &lt;code>reduce()&lt;/code>, &lt;code>count()&lt;/code>, and more.&lt;/p>
&lt;p>Here&amp;rsquo;s an example to scale numbers into a range between zero and one.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">pyspark&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">sc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pyspark&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SparkContext&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parallelize&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">min_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reduce&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">min&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">max_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reduce&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">max&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># We can simply use `min_value` and `max_value` since it&amp;#39;s already a Python `int` value from `reduce`.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">scaled_values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">min_value&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">max_value&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">min_value&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># But to access `scaled_values`, we need to call `collect`.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">scaled_values&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">collect&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>In Beam the results from all transforms result in a PCollection.
We use &lt;a href="/documentation/programming-guide/#side-inputs">&lt;em>side inputs&lt;/em>&lt;/a>
to feed a PCollection into a transform and access its values.&lt;/p>
&lt;p>Any transform that accepts a function, like
&lt;a href="/documentation/transforms/python/elementwise/map">&lt;code>Map&lt;/code>&lt;/a>,
can take side inputs.
If we only need a single value, we can use
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsSingleton">&lt;code>beam.pvalue.AsSingleton&lt;/code>&lt;/a> and access them as a Python value.
If we need multiple values, we can use
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsIter">&lt;code>beam.pvalue.AsIter&lt;/code>&lt;/a>
and access them as an &lt;a href="https://docs.python.org/3/glossary.html#term-iterable">&lt;code>iterable&lt;/code>&lt;/a>.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">min_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">min&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">max&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># To access `min_value` and `max_value`, we need to pass them as a side input.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">scaled_values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">values&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">minimum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">maximum&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">minimum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">maximum&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">minimum&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">minimum&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AsSingleton&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">min_value&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maximum&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AsSingleton&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">max_value&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">scaled_values&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;blockquote>
&lt;p>ℹ️ In Beam we need to pass a side input explicitly, but we get the
benefit that a reduction or aggregation does &lt;em>not&lt;/em> have to fit into memory.
Lazily computing side inputs also allows us to compute &lt;code>values&lt;/code> only once,
rather than for each distinct reduction (or requiring explicit caching of the RDD).&lt;/p>
&lt;/blockquote>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Take a look at all the available transforms in the &lt;a href="/documentation/transforms/python/overview">Python transform gallery&lt;/a>.&lt;/li>
&lt;li>Learn how to read from and write to files in the &lt;a href="/documentation/programming-guide/#pipeline-io">&lt;em>Pipeline I/O&lt;/em> section of the &lt;em>Programming guide&lt;/em>&lt;/a>&lt;/li>
&lt;li>Walk through additional WordCount examples in the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;li>If you&amp;rsquo;re interested in contributing to the Apache Beam codebase, see the &lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Learning Resources</title><link>/get-started/resources/learning-resources/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/resources/learning-resources/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="learning-resources">Learning Resources&lt;/h1>
&lt;p>Welcome to our learning resources. This page contains a collection of resources that will help you to get started and use Apache Beam. If you’re just starting, you can view this as a guided tour, otherwise you can jump straight to any section of your interest.&lt;/p>
&lt;p>If you have additional material that you would like to see here, please let us know at &lt;a href="mailto:user@beam.apache.org">user@beam.apache.org&lt;/a>!&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#getting-started">Getting Started&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#quickstart">Quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#learning-the-basics">Learning the Basics&lt;/a>&lt;/li>
&lt;li>&lt;a href="#fundamentals">Fundamentals&lt;/a>&lt;/li>
&lt;li>&lt;a href="#common-patterns">Common Patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#articles">Articles&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#data-analysis">Data Analysis&lt;/a>&lt;/li>
&lt;li>&lt;a href="#data-migration">Data Migration&lt;/a>&lt;/li>
&lt;li>&lt;a href="#machine-learning">Machine Learning&lt;/a>&lt;/li>
&lt;li>&lt;a href="#advanced-concepts">Advanced Concepts&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#videos">Videos&lt;/a>&lt;/li>
&lt;li>&lt;a href="#courses">Courses&lt;/a>&lt;/li>
&lt;li>&lt;a href="#books">Books&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#building-big-data-pipelines-with-apache-beam">Building Big Data Pipelines with Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="#streaming-systems-the-what-where-when-and-how-of-large-scale-data-processing">Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#certifications">Certifications&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#getting-started-with-apache-beam-quest">Getting Started with Apache Beam Quest&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#interactive-labs">Interactive Labs&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#java">Java&lt;/a>&lt;/li>
&lt;li>&lt;a href="#python">Python&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#beam-katas">Beam Katas&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#java-1">Java&lt;/a>&lt;/li>
&lt;li>&lt;a href="#python-1">Python&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#code-examples">Code Examples&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#dataflow-cookbook">Dataflow Cookbook&lt;/a>&lt;/li>
&lt;li>&lt;a href="#java-2">Java&lt;/a>&lt;/li>
&lt;li>&lt;a href="#python-2">Python&lt;/a>&lt;/li>
&lt;li>&lt;a href="#beam-playground">Beam Playground&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#api-reference">API Reference&lt;/a>&lt;/li>
&lt;li>&lt;a href="#feedback-and-suggestions">Feedback and Suggestions&lt;/a>&lt;/li>
&lt;li>&lt;a href="#how-to-contribute">How to Contribute&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="getting-started">Getting Started&lt;/h2>
&lt;h3 id="quickstart">Quickstart&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="/get-started/quickstart-java/">Java Quickstart&lt;/a>&lt;/strong> - How to set up and run a WordCount pipeline on the Java SDK.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/get-started/quickstart-py/">Python Quickstart&lt;/a>&lt;/strong> - How to set up and run a WordCount pipeline on the Python SDK.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/get-started/quickstart-go/">Go Quickstart&lt;/a>&lt;/strong> - How to set up and run a WordCount pipeline on the Go SDK.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://medium.com/google-cloud/setting-up-a-java-development-environment-for-apache-beam-on-google-cloud-platform-ec0c6c9fbb39">Java Development Environment&lt;/a>&lt;/strong> - Setting up a Java development environment for Apache Beam using IntelliJ and Maven.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://medium.com/google-cloud/python-development-environments-for-apache-beam-on-google-cloud-platform-b6f276b344df">Python Development Environment&lt;/a>&lt;/strong> - Setting up a Python development environment for Apache Beam using PyCharm.&lt;/li>
&lt;/ul>
&lt;h3 id="learning-the-basics">Learning the Basics&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="/get-started/wordcount-example/">WordCount&lt;/a>&lt;/strong> - Walks you through the code of a simple WordCount pipeline. This is a very basic pipeline intended to show the most basic concepts of data processing. WordCount is the &amp;ldquo;Hello World&amp;rdquo; for data processing.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/get-started/mobile-gaming-example/">Mobile Gaming&lt;/a>&lt;/strong> - Introduces how to consider time while processing data, user defined transforms, windowing, filtering data, streaming pipelines, triggers, and session analysis. This is a great place to start once you get the hang of WordCount.&lt;/li>
&lt;/ul>
&lt;h3 id="fundamentals">Fundamentals&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="/documentation/programming-guide/">Programming Guide&lt;/a>&lt;/strong> - The Programming Guide contains more in-depth information on most topics in the Apache Beam SDK. These include descriptions on how everything works as well as code snippets to see how to use every part. This can be used as a reference guidebook.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101">The world beyond batch: Streaming 101&lt;/a>&lt;/strong> - Covers some basic background information, terminology, time domains, batch processing, and streaming.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102">The world beyond batch: Streaming 102&lt;/a>&lt;/strong> - Tour of the unified batch and streaming programming model in Beam, alongside with an example to explain many of the concepts.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/documentation/runtime/model">Apache Beam Execution Model&lt;/a>&lt;/strong> - Explanation on how runners execute an Apache Beam pipeline. This includes why serialization is important, and how a runner might distribute the work in parallel to multiple machines.&lt;/li>
&lt;/ul>
&lt;h3 id="common-patterns">Common Patterns&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1">Common Use Case Patterns Part 1&lt;/a>&lt;/strong> - Common patterns such as writing data to multiple storage locations, slowly-changing lookup cache, calling external services, dealing with bad data, and starting jobs through a REST endpoint.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-2">Common Use Case Patterns Part 2&lt;/a>&lt;/strong> - Common patterns such as GroupBy using multiple data properties, joining two PCollections on a common key, streaming large lookup tables, merging two streams with different window lengths, and threshold detection with time-series data.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://nanthrax.blogspot.com/2018/02/apache-beam-easily-implement-backoff_18.html">Retry Policy&lt;/a>&lt;/strong> - Adding a retry policy to a &lt;code>DoFn&lt;/code>.&lt;/li>
&lt;/ul>
&lt;h2 id="articles">Articles&lt;/h2>
&lt;h3 id="data-analysis">Data Analysis&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://medium.com/google-cloud/predicting-social-engagement-for-the-worlds-news-with-tensorflow-and-cloud-dataflow-part-1-b92ba8f14a7">Predicting news social engagement&lt;/a>&lt;/strong> - Using multiple data sources, many common design patterns, and sentiment analysis to get insights into different news articles for TensorFlow and Dataflow.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/community/tutorials/cloud-iot-rtdp">Processing IoT Data&lt;/a>&lt;/strong> - IoT sensors are continuously streaming data to the cloud. Learn how to handle the sensor data which can be useful for real-time monitoring, alerts, long-term data storage for analysis, performance improvement, and model training.&lt;/li>
&lt;/ul>
&lt;h3 id="data-migration">Data Migration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://medium.com/google-cloud/oracle-data-to-google-bigquery-using-google-cloud-dataflow-and-dataprep-20884571a9e5">Oracle Database to Google BigQuery&lt;/a>&lt;/strong> - Migrate data from an &lt;a href="https://www.oracle.com/database/index.html">Oracle Database&lt;/a> into &lt;a href="https://cloud.google.com/bigquery">BigQuery&lt;/a> using &lt;a href="https://cloud.google.com/dataprep/">Dataprep&lt;/a>.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://medium.com/google-cloud/export-bigquery-to-google-datastore-with-apache-beam-google-dataflow-7fff1566f345">Google BigQuery to Google Datastore&lt;/a>&lt;/strong> - Migrate data from a &lt;a href="https://cloud.google.com/bigquery/">BigQuery&lt;/a> table into &lt;a href="https://cloud.google.com/datastore/">Datastore&lt;/a> without thinking of its schema.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/blog/products/gcp/using-apache-beam-and-cloud-dataflow-to-integrate-sap-hana-and-bigquery">SAP HANA to Google BigQuery&lt;/a>&lt;/strong> - Migrate data from a &lt;a href="https://www.sapphiresystems.com/en-us/products/sap-hana">SAP HANA&lt;/a> in-memory database into &lt;a href="https://cloud.google.com/bigquery">BigQuery&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h3 id="machine-learning">Machine Learning&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="/documentation/ml/about-ml">Machine Learning using the RunInference API&lt;/a>&lt;/strong> - Use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Follow the &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference">RunInference API pipeline examples&lt;/a> to do image classification, image segmentation, language modeling, and MNIST digit classification. See examples of &lt;a href="/documentation/transforms/python/elementwise/runinference/">RunInference transforms&lt;/a>.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/dataflow/examples/molecules-walkthrough">Machine Learning Preprocessing and Prediction&lt;/a>&lt;/strong> - Predict the molecular energy from data stored in the &lt;a href="https://en.wikipedia.org/wiki/Spatial_Data_File">Spatial Data File&lt;/a> (SDF) format. Train a &lt;a href="https://www.tensorflow.org/">TensorFlow&lt;/a> model with &lt;a href="https://github.com/tensorflow/transform">tf.Transform&lt;/a> for preprocessing in Python. This also shows how to create batch and streaming prediction pipelines in Apache Beam.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/pre-processing-tensorflow-pipelines-tftransform-google-cloud">Machine Learning Preprocessing&lt;/a>&lt;/strong> - Find the optimal parameter settings for simulated physical machines like a bottle filler or cookie machine. The goal of each simulated machine is to have the same input/output of the actual machine, making it a &amp;ldquo;digital twin&amp;rdquo;. This uses &lt;a href="https://github.com/tensorflow/transform">tf.Transform&lt;/a> for preprocessing.&lt;/li>
&lt;/ul>
&lt;h3 id="advanced-concepts">Advanced Concepts&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://amygdala.github.io/dataflow/app_engine/2017/10/24/gae_dataflow.html">Running on AppEngine&lt;/a>&lt;/strong> - Use a Dataflow template to launch a pipeline from Google AppEngine, and how to run the pipeline periodically via a cron job.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/blog/2017/02/13/stateful-processing.html">Stateful Processing&lt;/a>&lt;/strong> - Learn how to access a persistent mutable state while processing input elements, this allows for &lt;em>side effects&lt;/em> in a &lt;code>DoFn&lt;/code>. This can be used for arbitrary-but-consistent index assignment, if you want to assign a unique incrementing index to each incoming element where order doesn&amp;rsquo;t matter.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/blog/2017/08/28/timely-processing.html">Timely and Stateful Processing&lt;/a>&lt;/strong> - An example on how to do batched RPC calls. The call requests are stored in a mutable state as they are received. Once there are either enough requests or a certain time has passed, the batch of requests is triggered to be sent.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://cloud.google.com/blog/products/gcp/running-external-libraries-with-cloud-dataflow-for-grid-computing-workloads">Running External Libraries&lt;/a>&lt;/strong> - Call an external library written in a language that does not have a native SDK in Apache Beam such as C++.&lt;/li>
&lt;/ul>
&lt;h2 id="videos">Videos&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqIEiHGunZXg_yoS7unlHNJt">Getting Started with Apache Beam&lt;/a>&lt;/strong> - Five part video series for understanding basic to advanced concepts.&lt;/li>
&lt;li>See more &lt;a href="/get-started/resources/videos-and-podcasts/">Videos and Podcasts&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="courses">Courses&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://beamcollege.dev/">Beam College&lt;/a>&lt;/strong> &amp;ndash; Free live and recorded lessons for learning Beam and data processing.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://www.coursera.org/specializations/serverless-data-processing-with-dataflow">Serverless Data Processing&lt;/a>&lt;/strong> - Course specialized for Dataflow runner.&lt;/li>
&lt;/ul>
&lt;h2 id="books">Books&lt;/h2>
&lt;h3 id="building-big-data-pipelines-with-apache-beam">Building Big Data Pipelines with Apache Beam&lt;/h3>
&lt;p>&lt;strong>&lt;a href="https://www.packtpub.com/product/building-big-data-pipelines-with-apache-beam/9781800564930">Building Big Data Pipelines with Apache Beam&lt;/a>&lt;/strong> by Jan Lukavský, Packt. (January 2022). A general description of the Apache Beam model including gradually built examples that help create solid understanding of the subject. In the first part the book explains concepts using Java SDK, then SQL DSL and Portability layer with focus on Python SDK. The last part of the book is dedicated to more advanced topics like IO connectors using Splittable DoFn and description of how a typical runner executes Pipeline.&lt;/p>
&lt;h3 id="streaming-systems-the-what-where-when-and-how-of-large-scale-data-processing">Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing&lt;/h3>
&lt;p>&lt;strong>&lt;a href="https://learning.oreilly.com/library/view/streaming-systems/9781491983867/">Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing&lt;/a>&lt;/strong> by Tyler Akidau, Slava Chernyak, Reuven Lax. (August 2018). Expanded from Tyler Akidau’s popular blog posts &amp;ldquo;Streaming 101&amp;rdquo; and &amp;ldquo;Streaming 102&amp;rdquo;, this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams.&lt;/p>
&lt;h2 id="certifications">Certifications&lt;/h2>
&lt;h3 id="getting-started-with-apache-beam-quest">Getting Started with Apache Beam Quest&lt;/h3>
&lt;p>&lt;strong>&lt;a href="https://www.cloudskillsboost.google/quests/310">Get Started with Apache Beam&lt;/a>&lt;/strong> This quest includes four labs that teach you how to write and test Apache Beam pipelines. Three of the labs use Java and one uses Python. Each lab takes about 1.5 hours to complete. When you complete the quest, you&amp;rsquo;re granted a badge that you can use to show your Beam expertise.&lt;/p>
&lt;h2 id="interactive-labs">Interactive Labs&lt;/h2>
&lt;h3 id="java">Java&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://qwiklabs.com/focuses/608?locale=en&amp;amp;parent=catalog">Big Data Text Processing Pipeline&lt;/a>&lt;/strong> (40m) - Run a word count pipeline on the Dataflow runner.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://qwiklabs.com/focuses/3393?locale=en&amp;amp;parent=catalog">Real Time Machine Learning&lt;/a>&lt;/strong> (45m) - Create a real-time flight delay prediction service using historical data on internal flights in the United States.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://qwiklabs.com/focuses/1160?locale=en&amp;amp;parent=catalog">Visualize Real-Time Geospatial Data&lt;/a>&lt;/strong> (60m) - Process real-time streaming data from a real-time real world historical data set, store the results in BigQuery, and visualize the geospatial data on Data Studio.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://qwiklabs.com/focuses/3392?locale=en&amp;amp;parent=catalog">Processing Time Windowed Data&lt;/a>&lt;/strong> (90m) - Implement time-windowed aggregation to augment the raw data in order to produce a consistent training and test datasets for a machine learning model.&lt;/li>
&lt;/ul>
&lt;h3 id="python">Python&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://www.qwiklabs.com/focuses/1098?parent=catalog">Python Qwik Start&lt;/a>&lt;/strong> (30m) - Run a word count pipeline on the Dataflow runner.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://qwiklabs.com/focuses/1159?locale=en&amp;amp;parent=catalog">Simulate historic flights&lt;/a>&lt;/strong> (60m) - Simulate real-time historic internal flights in the United States and store the resulting simulated data in BigQuery.&lt;/li>
&lt;/ul>
&lt;h2 id="beam-katas">Beam Katas&lt;/h2>
&lt;p>Beam Katas are interactive Beam coding exercises (i.e. &lt;a href="http://codekata.com/">code katas&lt;/a>)
that can help you to learn Apache Beam concepts and programming model hands-on.
Built based on &lt;a href="https://www.jetbrains.com/education/">JetBrains Educational Products&lt;/a>, Beam Katas
objective is to provide a series of structured hands-on learning experiences for learners
to understand about Apache Beam and its SDKs by solving exercises with gradually increasing
complexity. Beam Katas are available for both Java and Python SDKs.&lt;/p>
&lt;h3 id="java-1">Java&lt;/h3>
&lt;ul>
&lt;li>Download &lt;a href="https://www.jetbrains.com/education/download/#section=idea">IntelliJ Edu&lt;/a>&lt;/li>
&lt;li>Upon opening the IDE, expand the &amp;ldquo;Learn and Teach&amp;rdquo; menu, then select &amp;ldquo;Browse Courses&amp;rdquo;&lt;/li>
&lt;li>Search for &amp;ldquo;Beam Katas - Java&amp;rdquo;&lt;/li>
&lt;li>Expand the &amp;ldquo;Advanced Settings&amp;rdquo; and modify the &amp;ldquo;Location&amp;rdquo; and &amp;ldquo;Jdk&amp;rdquo; appropriately&lt;/li>
&lt;li>Click &amp;ldquo;Join&amp;rdquo;&lt;/li>
&lt;li>&lt;a href="https://www.jetbrains.com/help/education/learner-start-guide.html?section=Introduction%20to%20Java#explore_course">Learn more&lt;/a> about how to use the Education product&lt;/li>
&lt;/ul>
&lt;h3 id="python-1">Python&lt;/h3>
&lt;ul>
&lt;li>Download &lt;a href="https://www.jetbrains.com/education/download/#section=pycharm-edu">PyCharm Edu&lt;/a>&lt;/li>
&lt;li>Upon opening the IDE, expand the &amp;ldquo;Learn and Teach&amp;rdquo; menu, then select &amp;ldquo;Browse Courses&amp;rdquo;&lt;/li>
&lt;li>Search for &amp;ldquo;Beam Katas - Python&amp;rdquo;&lt;/li>
&lt;li>Expand the &amp;ldquo;Advanced Settings&amp;rdquo; and modify the &amp;ldquo;Location&amp;rdquo; and &amp;ldquo;Interpreter&amp;rdquo; appropriately&lt;/li>
&lt;li>Click &amp;ldquo;Join&amp;rdquo;&lt;/li>
&lt;li>&lt;a href="https://www.jetbrains.com/help/education/learner-start-guide.html?section=Introduction%20to%20Python#explore_course">Learn more&lt;/a> about how to use the Education product&lt;/li>
&lt;/ul>
&lt;h2 id="code-examples">Code Examples&lt;/h2>
&lt;h3 id="dataflow-cookbook">Dataflow Cookbook&lt;/h3>
&lt;p>The &lt;a href="https://github.com/GoogleCloudPlatform/dataflow-cookbook">cookbook&lt;/a> includes examples in Java, Python, and Scala (via Scio), provides ready-to-launch and self-contained Beam pipelines.&lt;/p>
&lt;h3 id="java-2">Java&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://github.com/apache/beam/tree/master/examples/java/src/main/java/org/apache/beam/examples/cookbook">Snippets 1&lt;/a>&lt;/strong> - Commonly-used data analysis patterns such as how to use &lt;a href="https://cloud.google.com/bigquery">BigQuery&lt;/a>, a CombinePerKey transform, remove duplicate lines in files, filtering, joining PCollections, getting the maximum value of a PCollection, etc.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://github.com/apache/beam/tree/master/examples/java/src/main/java/org/apache/beam/examples/common">Snippets 2&lt;/a>&lt;/strong> - Additional examples on common tasks such as configuring &lt;a href="https://cloud.google.com/bigquery">BigQuery&lt;/a>, &lt;a href="https://cloud.google.com/pubsub/">PubSub&lt;/a>, writing one file per window, etc.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://github.com/apache/beam/tree/master/examples/java/src/main/java/org/apache/beam/examples/complete">Complete Examples&lt;/a>&lt;/strong> - End-to-end example pipelines such as an auto complete, a streaming word extract, calculating the Term Frequency-Inverse Document Frequency (&lt;a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF&lt;/a>), getting the top Wikipedia sessions, traffic max lane flow, traffic routes, etc.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://github.com/GoogleCloudPlatform/cloud-code-samples/tree/v1/java/java-dataflow-samples/read-pubsub-write-bigquery">Pub/Sub to BigQuery&lt;/a>&lt;/strong> - A complete example demonstrates using Apache Beam on Dataflow to convert JSON encoded Pub/Sub subscription message strings into structured data and write that data to a BigQuery table.&lt;/li>
&lt;/ul>
&lt;h3 id="python-2">Python&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/cookbook">Snippets&lt;/a>&lt;/strong> - Commonly-used data analysis patterns such as how to use &lt;a href="https://cloud.google.com/bigquery">BigQuery&lt;/a>, &lt;a href="https://cloud.google.com/datastore/">Datastore&lt;/a>, coders, combiners, filters, custom PTransforms, etc.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete">Complete Examples&lt;/a>&lt;/strong> - End-to-end example pipelines such as an auto complete, getting mobile gaming statistics, calculating the &lt;a href="https://en.wikipedia.org/wiki/Julia_set">Julia set&lt;/a>, solving distributing optimization tasks, estimating PI, calculating the Term Frequency-Inverse Document Frequency (&lt;a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF&lt;/a>), getting the top Wikipedia sessions, etc.&lt;/li>
&lt;/ul>
&lt;h3 id="beam-playground">Beam Playground&lt;/h3>
&lt;ul>
&lt;li>&lt;a href="https://play.beam.apache.org">Beam Playground&lt;/a> is an interactive environment to try out Beam transforms and examples without having to install Apache Beam in your environment.
You can try the available Apache Beam examples at &lt;a href="https://play.beam.apache.org">Beam Playground&lt;/a>.&lt;/li>
&lt;li>Learn more about how to add an Apache Beam example/test/kata into Beam Playground catalog &lt;a href="/get-started/try-beam-playground/#how-to-add-new-examples">here&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2 id="api-reference">API Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="/documentation/sdks/javadoc/">Java API Reference&lt;/a>&lt;/strong> - Official API Reference for the Java SDK.&lt;/li>
&lt;li>&lt;strong>&lt;a href="/documentation/sdks/pydoc/">Python API Reference&lt;/a>&lt;/strong> - Official API Reference for the Python SDK.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go API Reference&lt;/a>&lt;/strong> - Official API Reference for the Go SDK.&lt;/li>
&lt;/ul>
&lt;h2 id="feedback-and-suggestions">Feedback and Suggestions&lt;/h2>
&lt;p>We are open for feedback and suggestions, you can find different ways to reach out to the community in the &lt;a href="/community/contact-us/">Contact Us&lt;/a> page.&lt;/p>
&lt;p>If you have a bug report or want to suggest a new feature, you can let us know by &lt;a href="https://github.com/apache/beam/issues/new/choose">submitting a new issue&lt;/a>.&lt;/p>
&lt;h2 id="how-to-contribute">How to Contribute&lt;/h2>
&lt;p>We welcome contributions from everyone! To learn more on how to contribute, check our &lt;a href="/contribute/">Contribution Guide&lt;/a>.&lt;/p></description></item><item><title>Get-Started: The Tour of Beam</title><link>/get-started/tour-of-beam/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/tour-of-beam/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="the-tour-of-beam">The Tour of Beam&lt;/h1>
&lt;p>The &amp;ldquo;Tour of Beam&amp;rdquo; is an interactive way of learning to write Beam code with a sandbox, where you can write and run pipelines while walking through various concepts. Please &lt;a href="https://tour.beam.apache.org/">click here&lt;/a> to try it out.&lt;/p></description></item><item><title>Get-Started: Try Apache Beam</title><link>/get-started/try-apache-beam/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/try-apache-beam/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="try-apache-beam">Try Apache Beam&lt;/h1>
&lt;p>You can try an Apache Beam pipeline using our interactive notebooks.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="interactive-wordcount-in-colab">Interactive WordCount in Colab&lt;/h2>
&lt;p>This interactive notebook shows you what a simple, minimal version of WordCount looks like.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">package&lt;/span> &lt;span class="nn">samples.quickstart&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.Pipeline&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.io.TextIO&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.options.PipelineOptions&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.options.PipelineOptionsFactory&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.transforms.Count&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.transforms.Filter&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.transforms.FlatMapElements&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.transforms.MapElements&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.values.KV&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.values.TypeDescriptors&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">java.util.Arrays&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">WordCount&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">inputsDir&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s">&amp;#34;data/*&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">outputsPrefix&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s">&amp;#34;outputs/part&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Read lines&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">inputsDir&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Find words&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FlatMapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">line&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">line&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">split&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;[^\\p{L}]+&amp;#34;&lt;/span>&lt;span class="o">))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Filter empty words&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">!&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">isEmpty&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Count words&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Count&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">perElement&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Write results&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34;: &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">wordCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">outputsPrefix&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;a class="button button--primary" target="_blank"
href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-java.ipynb">
Run in Colab
&lt;/a>
&lt;a class="button button--primary" target="_blank"
href="https://github.com/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-java.ipynb">
View on GitHub
&lt;/a>&lt;/p>
&lt;p class="language-java">To learn how to install and run the Apache Beam Java SDK on your own computer, follow the instructions in the &lt;a href="/get-started/quickstart-java">Java Quickstart&lt;/a>.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">re&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">inputs_pattern&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;data/*&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">outputs_prefix&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;outputs/part&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read lines&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">inputs_pattern&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Find words&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">line&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">re&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">findall&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">r&lt;/span>&lt;span class="s2">&amp;#34;[a-zA-Z&amp;#39;]+&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">line&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Pair words with 1&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Group and sum&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Format results&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">word_count&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word_count&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Write results&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs_prefix&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">&lt;a class="button button--primary" target="_blank"
href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-py.ipynb">
Run in Colab
&lt;/a>
&lt;a class="button button--primary" target="_blank"
href="https://github.com/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-py.ipynb">
View on GitHub
&lt;/a>&lt;/p>
&lt;p class="language-py">To learn how to install and run the Apache Beam Python SDK on your own computer, follow the instructions in the &lt;a href="/get-started/quickstart-py">Python Quickstart&lt;/a>.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">package&lt;/span> &lt;span class="nx">main&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;context&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;flag&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;fmt&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;regexp&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam/io/textio&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam/runners/direct&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/stats&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">_&lt;/span> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/local&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">input&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;input&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;data/*&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;File(s) to read.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">output&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;output&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;outputs/wordcounts.txt&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Output filename.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">wordRE&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">regexp&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MustCompile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">`[a-zA-Z]+(&amp;#39;[a-z])?`&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Parse&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Init&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">pipeline&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipeline&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">root&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Root&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">words&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">wordRE&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">FindAllString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">counted&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">formatted&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Sprintf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;%s: %v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">counted&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">output&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">formatted&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">direct&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Execute&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Background&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">&lt;a class="button button--primary" target="_blank"
href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-go.ipynb">
Run in Colab
&lt;/a>
&lt;a class="button button--primary" target="_blank"
href="https://github.com/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-go.ipynb">
View on GitHub
&lt;/a>&lt;/p>
&lt;p class="language-go">To learn how to install and run the Apache Beam Go SDK on your own computer, follow the instructions in the &lt;a href="/get-started/quickstart-go">Go Quickstart&lt;/a>.&lt;/p>
&lt;p>For a more detailed explanation about how WordCount works, see the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Walk through additional WordCount examples in the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;li>If you&amp;rsquo;re interested in contributing to the Apache Beam codebase, see the &lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Try Beam Playground</title><link>/get-started/try-beam-playground/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/try-beam-playground/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="try-beam-playground">Try Beam Playground&lt;/h1>
&lt;p>Beam Playground is an interactive environment to try out Beam transforms and examples
without having to install Apache Beam in your environment.&lt;/p>
&lt;p>You can try the available Apache Beam examples at
&lt;a href="https://play.beam.apache.org/">Beam Playground&lt;/a>.&lt;/p>
&lt;h2 id="beam-playground-wordcount-example">Beam Playground WordCount Example&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_MinimalWordCount"
>&lt;/div>
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_WordCountWithMetrics"
>&lt;/div>
&lt;div
class="language-go playground-snippet"
data-sdk="go"
data-path="SDK_GO_MinimalWordCount"
>&lt;/div>
&lt;div
class="language-scio playground-snippet"
data-sdk="scio"
data-path="SDK_SCIO_MinimalWordCount"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_MinimalWordCount%22%2c%22sdk%22%3a%22java%22%7d%2c%7b%22path%22%3a%22SDK_PYTHON_WordCountWithMetrics%22%2c%22sdk%22%3a%22python%22%7d%2c%7b%22path%22%3a%22SDK_GO_MinimalWordCount%22%2c%22sdk%22%3a%22go%22%7d%2c%7b%22path%22%3a%22SDK_SCIO_MinimalWordCount%22%2c%22sdk%22%3a%22scio%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;p>See &lt;a href="https://github.com/apache/beam/blob/master/playground/load_your_code.md">here&lt;/a> for adding new examples.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Try examples in &lt;a href="https://play.beam.apache.org/">Apache Beam Playground&lt;/a>.&lt;/li>
&lt;li>Submit feedback using &amp;ldquo;Enjoying Playground?&amp;rdquo; in
&lt;a href="https://play.beam.apache.org/">Apache Beam Playground&lt;/a> or via
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSd5_5XeOwwW2yjEVHUXmiBad8Lxk-4OtNcgG45pbyAZzd4EbA/viewform?usp=pp_url">this form&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;li>If you&amp;rsquo;re interested in contributing to the Apache Beam Playground codebase, see the &lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item><item><title>Get-Started: Videos and Podcasts</title><link>/get-started/resources/videos-and-podcasts/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/resources/videos-and-podcasts/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="videos-and-podcasts">Videos and Podcasts&lt;/h1>
&lt;p>This page provides links to some of our favorite videos and podcasts that will help you get started and learn more about Apache Beam.&lt;/p>
&lt;ul>&lt;li>&lt;a href="">Introduction&lt;/a>&lt;/li>&lt;/ul>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/videoseries?list=PLIivdWyY5sqIEiHGunZXg_yoS7unlHNJt" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#general">General&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fundamentals-of-stream-processing-with-apache-beam">Fundamentals of Stream Processing with Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="#apache-beam-a-unified-model-for-batch-and-streaming-data-processing">Apache Beam: A Unified Model for Batch and Streaming Data Processing&lt;/a>&lt;/li>
&lt;li>&lt;a href="#fundamentals-of-stream-processing-with-apache-beam-1">Fundamentals of Stream Processing with Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="#software-engineering-radio-podcast-episode-272-apache-beam">Software Engineering Radio Podcast Episode 272: Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="#how-to-run-ml-inference-with-apache-beam">How to run ML Inference with Apache Beam&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#beam--friends">Beam &amp;amp; Friends&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#distributed-processing-for-machine-learning-production-pipelines">Distributed Processing for Machine Learning Production Pipelines&lt;/a>&lt;/li>
&lt;li>&lt;a href="#tensorflow-extended-an-end-to-end-machine-learning-platform-for-tensorflow">TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow&lt;/a>&lt;/li>
&lt;li>&lt;a href="#flink-and-beam-current-state--roadmap">Flink and Beam: Current State &amp;amp; Roadmap&lt;/a>&lt;/li>
&lt;li>&lt;a href="#lessons-learned-from-developing-a-stream-processing-platform-at-scale">Lessons learned from developing a stream processing platform at scale&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#technical-details">Technical Details&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#watermarks-time-and-progress-in-apache-beam-and-beyond">Watermarks: Time and Progress in Apache Beam and Beyond&lt;/a>&lt;/li>
&lt;li>&lt;a href="#triggers-in-apache-beam">Triggers in Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="#nexmark-evaluating-big-data-systems-with-apache-beam">Nexmark Evaluating Big Data systems with Apache Beam&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="general">General&lt;/h2>
&lt;p>The following resources provide general overviews and fundamentals of Apache Beam.&lt;/p>
&lt;h3 id="fundamentals-of-stream-processing-with-apache-beam">Fundamentals of Stream Processing with Apache Beam&lt;/h3>
&lt;p>Data Science Summit, Jerusalem, 2016&lt;/p>
&lt;p>Presented by Tyler Akidau, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/V35MwYcXEX0" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="apache-beam-a-unified-model-for-batch-and-streaming-data-processing">Apache Beam: A Unified Model for Batch and Streaming Data Processing&lt;/h3>
&lt;p>Hadoop Summit, San Jose, CA, 2016&lt;/p>
&lt;p>Presented by Davor Bonaci, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/7DZ8ONmeP5A" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="fundamentals-of-stream-processing-with-apache-beam-1">Fundamentals of Stream Processing with Apache Beam&lt;/h3>
&lt;p>@Scale Conference, San Jose, CA, 2016&lt;/p>
&lt;p>Presented by Dan Halperin, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;p>&lt;a href="https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2Fatscaleevents%2Fvideos%2F1775945569345206%2F&amp;amp;show_text=0&amp;amp;width=560">Link to Video&lt;/a>
&lt;br>&lt;/p>
&lt;h3 id="software-engineering-radio-podcast-episode-272-apache-beam">Software Engineering Radio Podcast Episode 272: Apache Beam&lt;/h3>
&lt;p>Presented by Frances Perry, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;p>&lt;a href="https://www.se-radio.net/2016/10/se-radio-episode-272-frances-perry-on-apache-beam/" target="_blank">&lt;img src="/images/resources/se-radio-podcast.png" alt="alt text">&lt;/a>
&lt;br>&lt;/p>
&lt;h3 id="how-to-run-ml-inference-with-apache-beam">How to run ML Inference with Apache Beam&lt;/h3>
&lt;p>Video by Cassie Kozyrkov&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/ga2TNdrFRoU" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h2 id="beam--friends">Beam &amp;amp; Friends&lt;/h2>
&lt;p>The following resources present Apache Beam partnerships.&lt;/p>
&lt;h3 id="distributed-processing-for-machine-learning-production-pipelines">Distributed Processing for Machine Learning Production Pipelines&lt;/h3>
&lt;p>Flink Forward, 2020&lt;/p>
&lt;p>Presented by Ahmet Altay, Robert Crowe, Reza Rokni&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/jV1WFTmm4qg" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="tensorflow-extended-an-end-to-end-machine-learning-platform-for-tensorflow">TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow&lt;/h3>
&lt;p>Spark+AI, San Francisco, 2019&lt;/p>
&lt;p>Presented by Konstantinos Katsiapis, Ahmet Altay&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/GTibgKo7WaI" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="flink-and-beam-current-state--roadmap">Flink and Beam: Current State &amp;amp; Roadmap&lt;/h3>
&lt;p>Flink Forward, Berlin, 2016&lt;/p>
&lt;p>Presented by Maximilian Michels, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/msdjh6KRXC8" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="lessons-learned-from-developing-a-stream-processing-platform-at-scale">Lessons learned from developing a stream processing platform at scale&lt;/h3>
&lt;p>Big Things Meetup, Tel Aviv, 2016&lt;/p>
&lt;p>By Amit Sela, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/fc-YigLn_gs" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h2 id="technical-details">Technical Details&lt;/h2>
&lt;p>The following resources provide detailed explanations about technical concepts in Apache Beam.&lt;/p>
&lt;h3 id="watermarks-time-and-progress-in-apache-beam-and-beyond">Watermarks: Time and Progress in Apache Beam and Beyond&lt;/h3>
&lt;p>Strata+Hadoop World, New York, 2016&lt;/p>
&lt;p>Presented by Slava Chernyak, &lt;em>Software Engineer at Google&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/TWxSLmkWPm4" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;br>
&lt;h3 id="triggers-in-apache-beam">Triggers in Apache Beam&lt;/h3>
&lt;p>Strata+Hadoop World, New York, 2016&lt;/p>
&lt;p>Presented by Kenneth Knowles, &lt;em>Apache Beam PPMC member&lt;/em>&lt;/p>
&lt;iframe class="video video--medium-size" width="560" height="315" src="https://www.youtube.com/embed/E1k0B9LN46M" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;h3 id="nexmark-evaluating-big-data-systems-with-apache-beam">Nexmark Evaluating Big Data systems with Apache Beam&lt;/h3>
&lt;p>ApacheCon, Miami, 2017&lt;/p>
&lt;p>Presented by Etienne Chauchot and Ismaël Mejia, &lt;em>Apache Beam PMC members&lt;/em>&lt;/p>
&lt;p>&lt;a href="www.slideshare.net/slideshow/embed_code/key/auWXjEK7GTkiUK">Link to Slides&lt;/a>
&lt;audio controls>&lt;/p>
&lt;source src="https://feathercastapache.files.wordpress.com/2017/05/0517-04-mejia.mp3" type="audio/mpeg">
Your browser does not support the audio element.
&lt;/audio>
### Universal metrics with Apache Beam
&lt;p>ApacheCon, Montreal, 2018&lt;/p>
&lt;p>Presented by Etienne Chauchot, &lt;em>Apache Beam PMC member&lt;/em>&lt;/p>
&lt;p>&lt;a href="www.slideshare.net/slideshow/embed_code/key/kKJRzR8HxkxLsR">Link to Slides&lt;/a>
&lt;audio controls>&lt;/p>
&lt;source src="//feathercastapache.files.wordpress.com/2018/09/03-universal-metrics-with-beam-etienne-chauchot.mp3" type="audio/mpeg">
Your browser does not support the audio element.
&lt;/audio>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Get-Started: WordCount quickstart for Java</title><link>/get-started/quickstart-java/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart-java/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="wordcount-quickstart-for-java">WordCount quickstart for Java&lt;/h1>
&lt;p>This quickstart shows you how to set up a Java development environment and run
an &lt;a href="/get-started/wordcount-example">example pipeline&lt;/a> written with the
&lt;a href="/documentation/sdks/java">Apache Beam Java SDK&lt;/a>, using a
&lt;a href="/documentation#runners">runner&lt;/a> of your choice.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Java codebase, see the
&lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;p>On this page:&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-development-environment">Set up your development environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#get-the-example-code">Get the example code&lt;/a>&lt;/li>
&lt;li>&lt;a href="#optional-convert-from-maven-to-gradle">Optional: Convert from Maven to Gradle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#get-sample-text">Get sample text&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-a-pipeline">Run a pipeline&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#run-wordcount-using-maven">Run WordCount using Maven&lt;/a>&lt;/li>
&lt;li>&lt;a href="#run-wordcount-using-gradle">Run WordCount using Gradle&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#inspect-the-results">Inspect the results&lt;/a>&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="set-up-your-development-environment">Set up your development environment&lt;/h2>
&lt;ol>
&lt;li>Download and install the
&lt;a href="https://www.oracle.com/technetwork/java/javase/downloads/index.html">Java Development Kit (JDK)&lt;/a>
version 8, 11, or 17. Verify that the
&lt;a href="https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html">JAVA_HOME&lt;/a>
environment variable is set and points to your JDK installation.&lt;/li>
&lt;li>Download and install &lt;a href="https://maven.apache.org/download.cgi">Apache Maven&lt;/a> by
following the &lt;a href="https://maven.apache.org/install.html">installation guide&lt;/a>
for your operating system.&lt;/li>
&lt;li>Optional: If you want to convert your Maven project to Gradle, install
&lt;a href="https://gradle.org/install/">Gradle&lt;/a>.&lt;/li>
&lt;/ol>
&lt;h2 id="get-the-example-code">Get the example code&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Generate a Maven example project that builds against the latest Beam release:
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.55.1 \
-DgroupId=org.example \
-DartifactId=word-count-beam \
-Dversion=&amp;#34;0.1&amp;#34; \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="n">mvn&lt;/span> &lt;span class="n">archetype&lt;/span>&lt;span class="err">:&lt;/span>&lt;span class="n">generate&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">archetypeGroupId&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">apache&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">beam&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">archetypeArtifactId&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="nb">beam-sdks&lt;/span>&lt;span class="n">-java-maven-archetypes-examples&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">archetypeVersion&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="mf">2.55&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">1&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">groupId&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">example&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">artifactId&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="nb">word-count&lt;/span>&lt;span class="n">-beam&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">version&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="s2">&amp;#34;0.1&amp;#34;&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">package&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">apache&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="py">examples&lt;/span> &lt;span class="p">`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">-D&lt;/span> &lt;span class="n">interactiveMode&lt;/span>&lt;span class="p">=&lt;/span>&lt;span class="n">false&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Maven creates a new project in the &lt;strong>word-count-beam&lt;/strong> directory.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Change into &lt;strong>word-count-beam&lt;/strong>:
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">cd word-count-beam/
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd &lt;/span>&lt;span class="p">.\&lt;/span>&lt;span class="nb">word-count&lt;/span>&lt;span class="n">-beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
The directory contains a &lt;strong>pom.xml&lt;/strong> and a &lt;strong>src&lt;/strong> directory with example
pipelines.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>List the example pipelines:
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">ls src/main/java/org/apache/beam/examples/
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">dir &lt;/span>&lt;span class="p">.\&lt;/span>&lt;span class="n">src&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">main&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">java&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">apache&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">examples&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
You should see the following examples:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>DebuggingWordCount.java&lt;/strong> (&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java">GitHub&lt;/a>)&lt;/li>
&lt;li>&lt;strong>MinimalWordCount.java&lt;/strong> (&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java">GitHub&lt;/a>)&lt;/li>
&lt;li>&lt;strong>WindowedWordCount.java&lt;/strong> (&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java">GitHub&lt;/a>)&lt;/li>
&lt;li>&lt;strong>WordCount.java&lt;/strong> (&lt;a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java">GitHub&lt;/a>)&lt;/li>
&lt;/ul>
&lt;p>The example used in this tutorial, &lt;strong>WordCount.java&lt;/strong>, defines a
Beam pipeline that counts words from an input file (by default, a &lt;strong>.txt&lt;/strong>
file containing Shakespeare&amp;rsquo;s &amp;ldquo;King Lear&amp;rdquo;). To learn more about the examples,
see the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="optional-convert-from-maven-to-gradle">Optional: Convert from Maven to Gradle&lt;/h2>
&lt;p>The steps below explain how to convert the build from Maven to Gradle for the
following runners:&lt;/p>
&lt;ul>
&lt;li>Direct runner&lt;/li>
&lt;li>Dataflow runner&lt;/li>
&lt;/ul>
&lt;p>The conversion process for other runners is similar. For additional guidance,
see
&lt;a href="https://docs.gradle.org/current/userguide/migrating_from_maven.html">Migrating Builds From Apache Maven&lt;/a>.&lt;/p>
&lt;ol>
&lt;li>In the directory with the &lt;strong>pom.xml&lt;/strong> file, run the automated Maven-to-Gradle
conversion:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>gradle init
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
You&amp;rsquo;ll be asked if you want to generate a Gradle build. Enter &lt;strong>yes&lt;/strong>. You&amp;rsquo;ll
also be prompted to choose a DSL (Groovy or Kotlin). For this tutorial, enter
&lt;strong>2&lt;/strong> for Kotlin.&lt;/li>
&lt;li>Open the generated &lt;strong>build.gradle.kts&lt;/strong> file and make the following changes:
&lt;ol>
&lt;li>In &lt;code>repositories&lt;/code>, replace &lt;code>mavenLocal()&lt;/code> with &lt;code>mavenCentral()&lt;/code>.&lt;/li>
&lt;li>In &lt;code>repositories&lt;/code>, declare a repository for Confluent Kafka dependencies:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>maven {
url = uri(&amp;#34;https://packages.confluent.io/maven/&amp;#34;)
}
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;li>At the end of the build script, add the following conditional dependency:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>if (project.hasProperty(&amp;#34;dataflow-runner&amp;#34;)) {
dependencies {
runtimeOnly(&amp;#34;org.apache.beam:beam-runners-google-cloud-dataflow-java:2.55.1&amp;#34;)
}
}
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;li>At the end of the build script, add the following task:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>task(&amp;#34;execute&amp;#34;, JavaExec::class) {
classpath = sourceSets[&amp;#34;main&amp;#34;].runtimeClasspath
mainClass.set(System.getProperty(&amp;#34;mainClass&amp;#34;))
}
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>Build your project:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>gradle build
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;/ol>
&lt;h2 id="get-sample-text">Get sample text&lt;/h2>
&lt;blockquote>
&lt;p>If you&amp;rsquo;re planning to use the DataflowRunner, you can skip this step. The
runner will pull text directly from Google Cloud Storage.&lt;/p>
&lt;/blockquote>
&lt;ol>
&lt;li>In the &lt;strong>word-count-beam&lt;/strong> directory, create a file called &lt;strong>sample.txt&lt;/strong>.&lt;/li>
&lt;li>Add some text to the file. For this example, use the text of Shakespeare&amp;rsquo;s
&lt;a href="https://storage.cloud.google.com/apache-beam-samples/shakespeare/kinglear.txt">King Lear&lt;/a>.&lt;/li>
&lt;/ol>
&lt;h2 id="run-a-pipeline">Run a pipeline&lt;/h2>
&lt;p>A single Beam pipeline can run on multiple Beam
&lt;a href="/documentation#runners">runners&lt;/a>. The
&lt;a href="/documentation/runners/direct">DirectRunner&lt;/a> is useful for getting started,
because it runs on your machine and requires no specific setup. If you&amp;rsquo;re just
trying out Beam and you&amp;rsquo;re not sure what to use, use the
&lt;a href="/documentation/runners/direct">DirectRunner&lt;/a>.&lt;/p>
&lt;p>The general process for running a pipeline goes like this:&lt;/p>
&lt;ol>
&lt;li>Complete any runner-specific setup.&lt;/li>
&lt;li>Build your command line:
&lt;ol>
&lt;li>Specify a runner with &lt;code>--runner=&amp;lt;runner&amp;gt;&lt;/code> (defaults to the
&lt;a href="/documentation/runners/direct">DirectRunner&lt;/a>).&lt;/li>
&lt;li>Add any runner-specific required options.&lt;/li>
&lt;li>Choose input files and an output location that are accessible to the
runner. (For example, you can&amp;rsquo;t access a local file if you are running
the pipeline on an external cluster.)&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>Run the command.&lt;/li>
&lt;/ol>
&lt;p>To run the WordCount pipeline:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Follow the setup steps for your runner:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runners/flink">FlinkRunner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/spark">SparkRunner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/dataflow">DataflowRunner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/samza">SamzaRunner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/nemo">NemoRunner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/jet">JetRunner&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The DirectRunner will work without additional setup.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Run the corresponding Maven or Gradle command below.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="run-wordcount-using-maven">Run WordCount using Maven&lt;/h3>
&lt;p>For Unix shells:&lt;/p>
&lt;p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--inputFile=sample.txt --output=counts&amp;#34; -Pdirect-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --inputFile=sample.txt --output=counts&amp;#34; -Pflink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=FlinkRunner --flinkMaster=&amp;lt;flink master&amp;gt; --filesToStage=target/word-count-beam-bundled-0.1.jar \
--inputFile=sample.txt --output=/tmp/counts&amp;#34; -Pflink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=SparkRunner --inputFile=sample.txt --output=counts&amp;#34; -Pspark-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--runner=DataflowRunner --project=&amp;lt;your-gcp-project&amp;gt; \
--region=&amp;lt;your-gcp-region&amp;gt; \
--gcpTempLocation=gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&amp;lt;your-gcs-bucket&amp;gt;/counts&amp;#34; \
-Pdataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args=&amp;#34;--inputFile=sample.txt --output=/tmp/counts --runner=SamzaRunner&amp;#34; -Psamza-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">mvn package -Pnemo-runner &amp;amp;&amp;amp; java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount \
--runner=NemoRunner --inputFile=`pwd`/sample.txt --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">mvn package -Pjet-runner
java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount \
--runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/sample.txt --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>For Windows PowerShell:&lt;/p>
&lt;p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--inputFile=sample.txt --output=counts&amp;#34; -P direct-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--runner=FlinkRunner --inputFile=sample.txt --output=counts&amp;#34; -P flink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">mvn package exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--runner=FlinkRunner --flinkMaster=&amp;lt;flink master&amp;gt; --filesToStage=.\target\word-count-beam-bundled-0.1.jar `
--inputFile=C:\path\to\quickstart\sample.txt --output=C:\tmp\counts&amp;#34; -P flink-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--runner=SparkRunner --inputFile=sample.txt --output=counts&amp;#34; -P spark-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--runner=DataflowRunner --project=&amp;lt;your-gcp-project&amp;gt; `
--region=&amp;lt;your-gcp-region&amp;gt; \
--gcpTempLocation=gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp `
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://&amp;lt;your-gcs-bucket&amp;gt;/counts&amp;#34; `
-P dataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount `
-D exec.args=&amp;#34;--inputFile=sample.txt --output=/tmp/counts --runner=SamzaRunner&amp;#34; -P samza-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">mvn package -P nemo-runner -DskipTests
java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount `
--runner=NemoRunner --inputFile=`pwd`/sample.txt --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">mvn package -P jet-runner
java -cp target/word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount `
--runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/sample.txt --output=counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3 id="run-wordcount-using-gradle">Run WordCount using Gradle&lt;/h3>
&lt;p>For Unix shells:&lt;/p>
&lt;p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">gradle clean execute -DmainClass=org.apache.beam.examples.WordCount \
--args=&amp;#34;--inputFile=sample.txt --output=counts&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">TODO: document Flink on Gradle: https://github.com/apache/beam/issues/21498&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">TODO: document FlinkCluster on Gradle: https://github.com/apache/beam/issues/21499&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">TODO: document Spark on Gradle: https://github.com/apache/beam/issues/21502&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">gradle clean execute -DmainClass=org.apache.beam.examples.WordCount \
--args=&amp;#34;--project=&amp;lt;your-gcp-project&amp;gt; --inputFile=gs://apache-beam-samples/shakespeare/* \
--output=gs://&amp;lt;your-gcs-bucket&amp;gt;/counts&amp;#34; -Pdataflow-runner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">TODO: document Samza on Gradle: https://github.com/apache/beam/issues/21500&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">TODO: document Nemo on Gradle: https://github.com/apache/beam/issues/21503&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">TODO: document Jet on Gradle: https://github.com/apache/beam/issues/21501&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="inspect-the-results">Inspect the results&lt;/h2>
&lt;p>After the pipeline has completed, you can view the output. There might be
multiple output files prefixed by &lt;code>count&lt;/code>. The number of output files is decided
by the runner, giving it the flexibility to do efficient, distributed execution.&lt;/p>
&lt;ol>
&lt;li>View the output files in a Unix shell:
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">ls counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">ls counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">ls /tmp/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">ls counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">gsutil ls gs://&amp;lt;your-gcs-bucket&amp;gt;/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">ls /tmp/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">ls counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">ls counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
The output files contain unique words and the number of occurrences of each
word.&lt;/li>
&lt;li>View the output content in a Unix shell:
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">more counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">more counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flinkCluster snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flinkCluster" data-lang="flinkCluster">more /tmp/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">more counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">gsutil cat gs://&amp;lt;your-gcs-bucket&amp;gt;/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-samza snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-samza" data-lang="samza">more /tmp/counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">more counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-jet snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-jet" data-lang="jet">more counts*
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
The order of elements is not guaranteed, to allow runners to optimize for
efficiency. But the output should look something like this:
&lt;pre tabindex="0">&lt;code>...
Think: 3
slower: 1
Having: 1
revives: 1
these: 33
wipe: 1
arrives: 1
concluded: 1
begins: 3
...
&lt;/code>&lt;/pre>&lt;/li>
&lt;/ol>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/java/">Beam SDK for Java&lt;/a>
and look through the
&lt;a href="https://beam.apache.org/releases/javadoc">Java SDK API reference&lt;/a>.&lt;/li>
&lt;li>Walk through the WordCount examples in the
&lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our
&lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite
&lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any
issues!&lt;/p></description></item><item><title>Get-Started: WordCount Quickstart for Python</title><link>/get-started/quickstart-py/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/quickstart-py/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="wordcount-quickstart-for-python">WordCount quickstart for Python&lt;/h1>
&lt;p>This guide shows you how to set up your Python development environment, get the Apache Beam SDK for Python, and run an example pipeline.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in contributing to the Apache Beam Python codebase, see the &lt;a href="/contribute">Contribution Guide&lt;/a>.&lt;/p>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#set-up-your-environment">Set up your environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#get-apache-beam">Get Apache Beam&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#create-and-activate-a-virtual-environment">Create and activate a virtual environment&lt;/a>&lt;/li>
&lt;li>&lt;a href="#download-and-install">Download and install&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#extra-requirements">Extra requirements&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#execute-a-pipeline">Execute a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#next-steps">Next Steps&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>The Python SDK supports Python 3.8, 3.9, 3.10 and 3.11. Beam 2.48.0 was the last release with support for Python 3.7.&lt;/p>
&lt;h2 id="set-up-your-environment">Set up your environment&lt;/h2>
&lt;p>For details, see
&lt;a href="/get-started/quickstart/python#set-up-your-development-environment">Set up your development environment&lt;/a>.&lt;/p>
&lt;h2 id="get-apache-beam">Get Apache Beam&lt;/h2>
&lt;h3 id="create-and-activate-a-virtual-environment">Create and activate a virtual environment&lt;/h3>
&lt;p>A virtual environment is a directory tree containing its own Python distribution. To create a virtual environment, run:&lt;/p>
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">python -m venv /path/to/directory&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PS&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="n">python&lt;/span> &lt;span class="n">-m&lt;/span> &lt;span class="n">venv&lt;/span> &lt;span class="n">C:&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">directory&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>A virtual environment needs to be activated for each shell that is to use it.
Activating it sets some environment variables that point to the virtual
environment&amp;rsquo;s directories.&lt;/p>
&lt;p>To activate a virtual environment in Bash, run:&lt;/p>
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">. /path/to/directory/bin/activate&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PS&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="n">C:&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">directory&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">Scripts&lt;/span>&lt;span class="p">\&lt;/span>&lt;span class="n">activate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">ps1&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>That is, execute the &lt;code>activate&lt;/code> script under the virtual environment directory you created.&lt;/p>
&lt;p>For instructions using other shells, see the &lt;a href="https://docs.python.org/3/library/venv.html">venv documentation&lt;/a>.&lt;/p>
&lt;h3 id="download-and-install">Download and install&lt;/h3>
&lt;p>Install the latest Python SDK from PyPI:&lt;/p>
&lt;div class='shell-unix snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-unix" data-lang="unix">pip install apache-beam&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='shell-powerShell snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-powerShell" data-lang="powerShell">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PS&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="n">python&lt;/span> &lt;span class="n">-m&lt;/span> &lt;span class="n">pip&lt;/span> &lt;span class="n">install&lt;/span> &lt;span class="nb">apache-beam&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="extra-requirements">Extra requirements&lt;/h4>
&lt;p>The above installation will not install all the extra dependencies for using features like the Google Cloud Dataflow runner. Information on what extra packages are required for different features are highlighted below. It is possible to install multiple extra requirements using something like &lt;code>pip install 'apache-beam[feature1,feature2]'&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Google Cloud Platform&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[gcp]'&lt;/code>&lt;/li>
&lt;li>Required for:
&lt;ul>
&lt;li>Google Cloud Dataflow Runner&lt;/li>
&lt;li>GCS IO&lt;/li>
&lt;li>Datastore IO&lt;/li>
&lt;li>BigQuery IO&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Amazon Web Services&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[aws]'&lt;/code>&lt;/li>
&lt;li>Required for I/O connectors interfacing with AWS&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Microsoft Azure&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[azure]'&lt;/code>&lt;/li>
&lt;li>Required for I/O connectors interfacing with Microsoft Azure&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Beam YAML API&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[yaml]'&lt;/code>&lt;/li>
&lt;li>Required for using &lt;a href="/documentation/sdks/yaml/">Beam YAML API&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Beam YAML Dataframe API&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[dataframe]'&lt;/code>&lt;/li>
&lt;li>Required for using &lt;a href="/documentation/dsls/dataframes/overview/">Beam Dataframe API&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Tests&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[test]'&lt;/code>&lt;/li>
&lt;li>Required for developing Beam and running unit tests&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Docs&lt;/strong>
&lt;ul>
&lt;li>Installation Command: &lt;code>pip install 'apache-beam[docs]'&lt;/code>&lt;/li>
&lt;li>Required for generating API documentation using Sphinx&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="execute-a-pipeline">Execute a pipeline&lt;/h2>
&lt;p>The Apache Beam &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples">examples&lt;/a> directory has many examples. All examples can be run locally by passing the required arguments described in the example script.&lt;/p>
&lt;p>For example, run &lt;code>wordcount.py&lt;/code> with the following command:&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">python -m apache_beam.examples.wordcount --input /path/to/inputfile --output /path/to/write/counts&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">python -m apache_beam.examples.wordcount --input /path/to/inputfile \
--output /path/to/write/counts \
--runner FlinkRunner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">python -m apache_beam.examples.wordcount --input /path/to/inputfile \
--output /path/to/write/counts \
--runner SparkRunner&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow"># As part of the initial setup, install Google Cloud Platform specific extra components. Make sure you
# complete the setup steps at /documentation/runners/dataflow/#setup
pip install apache-beam[gcp]
python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://&amp;lt;your-gcs-bucket&amp;gt;/counts \
--runner DataflowRunner \
--project your-gcp-project \
--region your-gcp-region \
--temp_location gs://&amp;lt;your-gcs-bucket&amp;gt;/tmp/&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-nemo snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-nemo" data-lang="nemo">This runner is not yet available for the Python SDK.&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>After the pipeline completes, you can view the output files at your specified
output path. For example, if you specify &lt;code>/dir1/counts&lt;/code> for the &lt;code>--output&lt;/code>
parameter, the pipeline writes the files to &lt;code>/dir1/&lt;/code> and names the files
sequentially in the format &lt;code>counts-0000-of-0001&lt;/code>.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ul>
&lt;li>Learn more about the &lt;a href="/documentation/sdks/python/">Beam SDK for Python&lt;/a>
and look through the &lt;a href="https://beam.apache.org/releases/pydoc">Python SDK API reference&lt;/a>.&lt;/li>
&lt;li>Get &lt;a href="/get-started/an-interactive-overview-of-beam">An Interactive Overview of Beam&lt;/a>&lt;/li>
&lt;li>Walk through these WordCount examples in the &lt;a href="/get-started/wordcount-example">WordCount Example Walkthrough&lt;/a>.&lt;/li>
&lt;li>Take a self-paced tour through our &lt;a href="/documentation/resources/learning-resources">Learning Resources&lt;/a>.&lt;/li>
&lt;li>Dive in to some of our favorite &lt;a href="/get-started/resources/videos-and-podcasts">Videos and Podcasts&lt;/a>.&lt;/li>
&lt;li>Join the Beam &lt;a href="/community/contact-us">users@&lt;/a> mailing list.&lt;/li>
&lt;/ul>
&lt;p>Please don&amp;rsquo;t hesitate to &lt;a href="/community/contact-us">reach out&lt;/a> if you encounter any issues!&lt;/p></description></item></channel></rss>