tree: 9958929ee95f7594d4a2f912f3ba18305b04dd2e [path history] [tgz]
  1. cmd/
  2. container/
  3. data/
  4. examples/
  5. pkg/
  6. test/
  7. build.gradle
  8. BUILD.md
  9. descriptor.xml
  10. gogradle.lock
  11. OWNERS
  12. README.md
sdks/go/README.md

Go SDK (experimental)

The Go SDK is currently an experimental feature of Apache Beam and not suitable for production use. It is based on the following initial design.

How to run the examples

Prerequisites: to use Google Cloud sources and sinks (default for most examples), follow the setup here. You can verify that it works by running the corresponding Java example.

The examples are normal Go programs and are most easily run directly. They are parameterized by Go flags. For example, to run wordcount on direct runner do:

$ pwd
[...]/sdks/go
$ go run examples/wordcount/wordcount.go --output=/tmp/result.txt
[{6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
[{10: KV<int,string>/GW/KV<int[varintz],bytes>}]
2018/03/21 09:39:03 Pipeline:
2018/03/21 09:39:03 Nodes: {1: []uint8/GW/bytes}
{2: string/GW/bytes}
{3: string/GW/bytes}
{4: string/GW/bytes}
{5: string/GW/bytes}
{6: KV<string,int>/GW/KV<bytes,int[varintz]>}
{7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}
{8: KV<string,int>/GW/KV<bytes,int[varintz]>}
{9: string/GW/bytes}
{10: KV<int,string>/GW/KV<int[varintz],bytes>}
{11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: string/GW/bytes}]
3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: string/GW/bytes}]
4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: string/GW/bytes}]
5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: string/GW/bytes}]
6: ParDo [In(Main): T <- {5: string/GW/bytes}] -> [Out: KV<T,int> -> {6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
7: CoGBK [In(Main): KV<string,int> <- {6: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: CoGBK<string,int> -> {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}]
8: Combine [In(Main): int <- {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}] -> [Out: KV<string,int> -> {8: KV<string,int>/GW/KV<bytes,int[varintz]>}]
9: ParDo [In(Main): KV<string,int> <- {8: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: string -> {9: string/GW/bytes}]
10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV<int,T> -> {10: KV<int,string>/GW/KV<int[varintz],bytes>}]
11: CoGBK [In(Main): KV<int,string> <- {10: KV<int,string>/GW/KV<int[varintz],bytes>}] -> [Out: CoGBK<int,string> -> {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}]
12: ParDo [In(Main): CoGBK<int,string> <- {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}] -> []
2018/03/21 09:39:03 Reading from gs://apache-beam-samples/shakespeare/kinglear.txt
2018/03/21 09:39:04 Writing to /tmp/result.txt

The debugging output is currently quite verbose and likely to change. The output is a local file in this case:

$ head /tmp/result.txt 
while: 2
darkling: 1
rail'd: 1
ford: 1
bleed's: 1
hath: 52
Remain: 1
disclaim: 1
sentence: 1
purse: 6

To run wordcount on dataflow runner do:

$  go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT> --staging_location=<YOUR_GCS_LOCATION>/staging --worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION> --output=<YOUR_GCS_LOCATION>/output

The output is a GCS file in this case:

$ gsutil cat <YOUR_GCS_LOCATION>/output* | head
Blanket: 1
blot: 1
Kneeling: 3
cautions: 1
appears: 4
Deserved: 1
nettles: 1
OSWALD: 53
sport: 3
Crown'd: 1

See BUILD.md for how to build Go code in general. See CONTAINERS.md for how to build and push the Go SDK harness container image.

Issues

Please use the sdk-go component for any bugs or feature requests.

Contributing to the Go SDK

New to developing Go?

https://tour.golang.org : The Go Tour gives you the basics of the language, interactively no installation required.

https://github.com/campoy/go-tooling-workshop is a great start on learning good (optional) development tools for Go.

Developing Go Beam SDK on Github

To make and test changes when working with Go, it's neecessary to clone your repository in a subdirectory of your GOPATH. This permits existing gradle tools to use your in progress changes.

# Create a Go compatible place for the repo, using src/github.com/apache/
# matches where Go will look for the files, or go get would put them.
$ mkdir -p $GOPATH/src/github.com/apache/
$ cd $GOPATH/src/github.com/apache/


# Clone the repo, and update your branch as normal
$ git clone https://github.com/apache/beam.git
$ cd beam
$ git remote add <GitHub_user> git@github.com:<GitHub_user>/beam.git
$ git fetch --all

# Get or Update all the Go SDK dependencies
$ go get -u ./...
# Test that the system compiles and runs.
$ go test ./...

If you don’t have a GOPATH set, create a new directory in your home directory, and use that.

Follow the contribution guide to create branches, and submit pull requests as normal.