The Beam portability effort aims to make it possible for any SDK to work with any runner. One aspect of the effort is the isolation of the SDK and user code execution environment from the runner execution environment using docker, as defined in the portability container contract.
This document describes how to build and push container images to that end. The push step generally requires an account with a public docker registry, such as bintray.io or Google Container Registry. These instructions assume familiarity with docker and a bintray account under the current username with a docker repository named “apache”.
Prerequisites: install docker on your platform. You can verify that it works by running docker images
or any other docker command.
Run Maven with the build-containers
profile:
$ pwd [...]/beam $ mvn clean install -DskipTests -Pbuild-containers [...] [INFO] --- dockerfile-maven-plugin:1.3.5:build (default) @ beam-sdks-python-container --- [INFO] Using Google application default credentials [INFO] loaded credentials for user account with clientId=[...].apps.googleusercontent.com [INFO] Building Docker context /Users/herohde/go/src/github.com/apache/beam/sdks/python/container [INFO] [INFO] Image will be built as herohde-docker-apache.bintray.io/beam/python:latest [INFO] [INFO] Step 1/4 : FROM python:2 [INFO] Pulling from library/python [INFO] Digest: sha256:181ee8edfd9d44323c82dcba0b187f1ee2eb3d4a11c8398fc06952ed5f9ef32c [INFO] Status: Image is up to date for python:2 [INFO] ---> b1d5c2d7dda8 [INFO] Step 2/4 : MAINTAINER "Apache Beam <dev@beam.apache.org>" [INFO] ---> Running in f1bc3c4943b3 [INFO] ---> 9867b512e47e [INFO] Removing intermediate container f1bc3c4943b3 [INFO] Step 3/4 : ADD target/linux_amd64/boot /opt/apache/beam/ [INFO] ---> 5cb81c3d2d90 [INFO] Removing intermediate container 4a41ad80005a [INFO] Step 4/4 : ENTRYPOINT /opt/apache/beam/boot [INFO] ---> Running in 40f5b945afe7 [INFO] ---> c8bf712741c8 [INFO] Removing intermediate container 40f5b945afe7 [INFO] Successfully built c8bf712741c8 [INFO] Successfully tagged herohde-docker-apache.bintray.io/beam/python:latest [INFO] [INFO] Detected build of image with id c8bf712741c8 [INFO] Building jar: /Users/herohde/go/src/github.com/apache/beam/sdks/python/container/target/beam-sdks-python-container-2.3.0-SNAPSHOT-docker-info.jar [INFO] Successfully built herohde-docker-apache.bintray.io/beam/python:latest [INFO] [...]
Note that the container images include built content, including the Go boot code, so you should build from the top level directory unless you're familiar with Maven.
(Optional) When built, you can see, inspect and run them locally:
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE herohde-docker-apache.bintray.io/beam/python latest c8bf712741c8 About an hour ago 690MB herohde-docker-apache.bintray.io/beam/java latest 33efc0947952 About an hour ago 773MB [...]
Despite the names, these container images live only on your local machine. While we will re-use the same tag “latest” for each build, the images IDs will change.
(Optional): the default setting for docker-repository-root
specifies the above bintray location. You can override it by adding:
-Ddocker-repository-root=<location>
Similarly, if you want to specify a specific tag instead of “latest”, such as a “2.3.0” version, you can do so by adding:
-Ddockerfile.tag=<tag>
Preprequisites: obtain a docker registry account and ensure docker can push images to it, usually by doing docker login
with the appropriate information. The image you want to push must also be present in the local docker image repository.
For the Python SDK harness container image, run:
$ docker push $USER-docker-apache.bintray.io/beam/python:latest The push refers to a repository [herohde-docker-apache.bintray.io/beam/python] f2a8798331f5: Pushed 6b200cb2b684: Layer already exists bf56c6510f38: Layer already exists 7890d67efa6f: Layer already exists b456afdc9996: Layer already exists d752a0310ee4: Layer already exists db64edce4b5b: Layer already exists d5d60fc34309: Layer already exists c01c63c6823d: Layer already exists latest: digest: sha256:58da4d9173a29622f0572cfa22dfeafc45e6750dde4beab57a47a9d1d17d601b size: 2222
Similarly for the Java SDK harness container image. If you want to push the same image to multiple registries, you can retagging the image using docker tag
and push.
(Optional) On any machine, you can now pull the pushed container image:
$ docker pull $USER-docker-apache.bintray.io/beam/python:latest latest: Pulling from beam/python 85b1f47fba49: Pull complete 5409e9a7fa9e: Pull complete 661393707836: Pull complete 1bb98c08d57e: Pull complete c842a08369e2: Pull complete 310408aa843f: Pull complete d6a27cfc2cf1: Pull complete 7a24cf0c9043: Pull complete 290b127dfe35: Pull complete Digest: sha256:58da4d9173a29622f0572cfa22dfeafc45e6750dde4beab57a47a9d1d17d601b Status: Downloaded newer image for herohde-docker-apache.bintray.io/beam/python:latest $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE herohde-docker-apache.bintray.io/beam/python latest c8bf712741c8 2 hours ago 690MB [...]
Note that the image IDs and digests match their local counterparts.