tree: cc3fa04d65c148b70b753a1c5d842707003a56de [path history] [tgz]
  1. cmd/
  2. container/
  3. data/
  4. examples/
  5. pkg/
  6. scripts/
  7. test/
  8. build.gradle
  9. BUILD.md
  10. OWNERS
  11. prepare_go_version.sh
  12. README.md
  13. run_with_go_version.sh
sdks/go/README.md

Go SDK

The Apache Beam Go SDK is the Beam Model implemented in the Go Programming Language. It is based on the following initial design.

How to run the examples

Prerequisites: to use Google Cloud sources and sinks (default for most examples), follow the setup here. You can verify that it works by running the corresponding Java example.

The examples are normal Go programs and are most easily run directly. They are parameterized by Go flags. For example, to run wordcount on the Go direct runner do:

$ pwd
[...]/sdks/go
$ go run examples/wordcount/wordcount.go --output=/tmp/result.txt
[{6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
[{10: KV<int,string>/GW/KV<int[varintz],bytes>}]
2018/03/21 09:39:03 Pipeline:
2018/03/21 09:39:03 Nodes: {1: []uint8/GW/bytes}
{2: string/GW/bytes}
{3: string/GW/bytes}
{4: string/GW/bytes}
{5: string/GW/bytes}
{6: KV<string,int>/GW/KV<bytes,int[varintz]>}
{7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}
{8: KV<string,int>/GW/KV<bytes,int[varintz]>}
{9: string/GW/bytes}
{10: KV<int,string>/GW/KV<int[varintz],bytes>}
{11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: string/GW/bytes}]
3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: string/GW/bytes}]
4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: string/GW/bytes}]
5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: string/GW/bytes}]
6: ParDo [In(Main): T <- {5: string/GW/bytes}] -> [Out: KV<T,int> -> {6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
7: CoGBK [In(Main): KV<string,int> <- {6: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: CoGBK<string,int> -> {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}]
8: Combine [In(Main): int <- {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}] -> [Out: KV<string,int> -> {8: KV<string,int>/GW/KV<bytes,int[varintz]>}]
9: ParDo [In(Main): KV<string,int> <- {8: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: string -> {9: string/GW/bytes}]
10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV<int,T> -> {10: KV<int,string>/GW/KV<int[varintz],bytes>}]
11: CoGBK [In(Main): KV<int,string> <- {10: KV<int,string>/GW/KV<int[varintz],bytes>}] -> [Out: CoGBK<int,string> -> {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}]
12: ParDo [In(Main): CoGBK<int,string> <- {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}] -> []
2018/03/21 09:39:03 Reading from gs://apache-beam-samples/shakespeare/kinglear.txt
2018/03/21 09:39:04 Writing to /tmp/result.txt

The debugging output is currently quite verbose and likely to change. The output is a local file in this case:

$ head /tmp/result.txt
while: 2
darkling: 1
rail'd: 1
ford: 1
bleed's: 1
hath: 52
Remain: 1
disclaim: 1
sentence: 1
purse: 6

To run wordcount on dataflow runner do:

$  go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT> --region=<YOUR_GCP_REGION> --staging_location=<YOUR_GCS_LOCATION>/staging --worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION> --output=<YOUR_GCS_LOCATION>/output

The output is a GCS file in this case:

$ gsutil cat <YOUR_GCS_LOCATION>/output* | head
Blanket: 1
blot: 1
Kneeling: 3
cautions: 1
appears: 4
Deserved: 1
nettles: 1
OSWALD: 53
sport: 3
Crown'd: 1

See BUILD.md for how to build Go code in general. See container documentation for how to build and push the Go SDK harness container image.

Issues

Please use the sdk-go component for any bugs or feature requests.

Contributing to the Go SDK

New to developing Go?

https://tour.golang.org : The Go Tour gives you the basics of the language, interactively no installation required.

https://github.com/campoy/go-tooling-workshop is a great start on learning good (optional) development tools for Go.

Developing Go Beam SDK on Github

The Go SDK uses Go Modules for dependency management so it's as simple as cloning the repo, making necessary changes and running tests.

Executing all unit tests for the SDK is possible from the <beam root>\sdks\go directory and running go test ./....

To test your change as Jenkins would execute it from a PR, from the beam root directory, run:

  • ./gradlew :sdks:go:goTest executes the unit tests.
  • ./gradlew :sdks:go:test:ulrValidatesRunner validates the SDK against the Portable Python runner.
  • ./gradlew :sdks:go:test:flinkValidatesRunner validates the SDK against the Flink runner.

Follow the contribution guide to create branches, and submit pull requests as normal.