"license": [
"metadata": {
"id": "lNKIMlEDZ_Vw",
"colab_type": "text"
"cell_type": "markdown",
"source": [
# Try Apache Beam - Java
In this notebook, we set up a Java development environment and work through a simple example using the [DirectRunner]( You can explore other runners with the [Beam Capatibility Matrix](
To navigate through different sections, use the table of contents. From **View** drop-down list, select **Table of contents**.
To run a code cell, you can click the **Run cell** button at the top left of the cell, or by select it and press **`Shift+Enter`**. Try modifying a code cell and re-running it to see what happens.
To learn more about Colab, see [Welcome to Colaboratory!](
"metadata": {
"id": "Fz6KSQ13_3Rr",
"colab_type": "text"
"cell_type": "markdown",
"source": [
# Setup
First, you need to set up your environment.
## Installing development tools
Let's start by installing Java. We'll use the `default-jdk`, which uses [OpenJDK]( This will take a while, so feel free to go for a walk or do some stretching.
**Note:** Alternatively, you could install the propietary [Oracle JDK]( instead.
Now, let's install [Gradle](, which we'll need to automate the build and running processes for our application.
**Note:** Alternatively, you could install and configure [Maven]( instead.
## build.gradle
We'll also need a [`build.gradle`]( file which will allow us to invoke some useful commands.
## Creating the directory structure
Java and Gradle expect a specific [directory structure]( This helps organize large projects into a standard structure.
For now, we only need a place where our quickstart code will reside. That has to go within `./src/main/java/`.
# Minimal word count
The following example is the "Hello, World!" of data processing, a basic implementation of word count. We're creating a simple data processing pipeline that reads a text file and counts the number of occurrences of every word.
There are many scenarios where all the data does not fit in memory. Notice that the outputs of the pipeline go to the file system, which allows for large processing jobs in distributed environments.
## Build and run
"Let's first check how the final file system structure looks like. These are all the files required to build and run our application.\n",
"There are two files generated:\n",
## Distributing your application
We can run our fat JAR file as long as we have a Java Runtime Environment installed.
To distribute, we copy the fat JAR file and run it with `java -jar`.
# Word count with comments
Below is mostly the same code as above, but with comments explaining every line in more detail.
