blob: a1e09610bf18104679372d8c2bbd54f7b893072f [file] [view]
---
title: "Beam Quickstart for Go"
aliases:
- /get-started/quickstart/
- /use/quickstart/
- /getting-started/
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Apache Beam Go SDK quickstart
This quickstart shows you how to run an
[example pipeline](https://github.com/apache/beam-starter-go) written with the
[Apache Beam Go SDK](/documentation/sdks/go), using the
[Direct Runner](/documentation/runners/direct/). The Direct Runner executes
pipelines locally on your machine.
If you're interested in contributing to the Apache Beam Go codebase, see the
[Contribution Guide](/contribute).
On this page:
{{< toc >}}
## Set up your development environment
Make sure you have a [Go](https://go.dev/) development environment ready. If
not, follow the instructions in the
[Download and install](https://go.dev/doc/install) page.
## Clone the GitHub repository
Clone or download the
[apache/beam-starter-go](https://github.com/apache/beam-starter-go) GitHub
repository and change into the `beam-starter-go` directory.
{{< highlight >}}
git clone https://github.com/apache/beam-starter-go.git
cd beam-starter-go
{{< /highlight >}}
## Run the quickstart
Run the following command:
{{< highlight >}}
go run main.go --input-text="Greetings"
{{< /highlight >}}
The output is similar to the following:
{{< highlight >}}
Hello
World!
Greetings
{{< /highlight >}}
The lines might appear in a different order.
## Explore the code
The main code file for this quickstart is **main.go**
([GitHub](https://github.com/apache/beam-starter-go/blob/main/main.go)).
The code performs the following steps:
1. Create a Beam pipeline.
3. Create an initial `PCollection`.
3. Apply transforms.
4. Run the pipeline, using the Direct Runner.
### Create a pipeline
Before creating a pipeline, call the [`Init`][Init] function:
```go
beam.Init()
```
Then create the pipeline:
```go
pipeline, scope := beam.NewPipelineWithRoot()
```
The [`NewPipelineWithRoot`][newPipelineWithRoot] function returns a new
`Pipeline` object, along with the pipeline's root scope. A *scope* is a
hierarchical grouping for composite transforms.
### Create an initial PCollection
The `PCollection` abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial `PCollection`. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).
This example uses the [`Create`][Create] function to create a `PCollection`
from an in-memory array of strings. The resulting `PCollection` contains the
strings "hello", "world!", and a user-provided input string.
```go
elements := beam.Create(scope, "hello", "world!", input_text)
```
### Apply transforms to the PCollection
Transforms can change, filter, group, analyze, or otherwise process the
elements in a `PCollection`.
This example adds a [ParDo](/documentation/programming-guide/#pardo) transform
to convert the input strings to title case:
```go
elements = beam.ParDo(scope, strings.Title, elements)
```
The [`ParDo`][ParDo] function takes the parent scope, a transform function that
will be applied to the data, and the input PCollection. It returns the output
PCollection.
The previous example uses the built-in [`strings.Title`][Title] function for
the transform. You can also provide an application-defined function to a ParDo.
For example:
```go
func logAndEmit(ctx context.Context, element string, emit func(string)) {
beamLog.Infoln(ctx, element)
emit(element)
}
```
This function logs the input element and returns the same element unmodified.
Create a ParDo for this function as follows:
```
beam.ParDo(scope, logAndEmit, elements)
```
At runtime, the ParDo will call the `logAndEmit` function on each element in
the input collection.
### Run the pipeline
The code shown in the previous sections defines a pipeline, but does not
process any data yet. To process data, you run the pipeline:
```go
beamx.Run(ctx, pipeline)
```
A Beam [runner](https://beam.apache.org/documentation/basics/#runner) runs a
Beam pipeline on a specific platform. This example uses the Direct Runner,
which is the default runner if you don't specify one. The Direct Runner runs
the pipeline locally on your machine. It is meant for testing and development,
rather than being optimized for efficiency. For more information, see
[Using the Direct Runner](https://beam.apache.org/documentation/runners/direct/).
For production workloads, you typically use a distributed runner that runs the
pipeline on a big data processing system such as Apache Flink, Apache Spark, or
Google Cloud Dataflow. These systems support massively parallel processing.
## Next Steps
* Learn more about the [Beam SDK for Go](/documentation/sdks/go/)
and look through the
[Go SDK API reference](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam).
* Take a self-paced tour through our
[Learning Resources](/documentation/resources/learning-resources).
* Dive in to some of our favorite
[Videos and Podcasts](/get-started/resources/videos-and-podcasts).
* Join the Beam [users@](/community/contact-us) mailing list.
Please don't hesitate to [reach out](/community/contact-us) if you encounter any
issues!
[Init]: https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#Init
[Create]: https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#Create
[NewPipelineWithRoot]: https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#NewPipelineWithRoot
[ParDo]: https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#ParDo
[Title]: https://pkg.go.dev/strings#Title