| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Apache Beam |
| |
| ## CI Environment |
| |
| Continuous Integration is important component of making Apache Beam robust and stable. |
| |
| Our execution environment for CI is mainly the Jenkins which is available at |
| [https://ci-beam.apache.org/](https://ci-beam.apache.org/). See |
| [.test-infra/jenkins/README](.test-infra/jenkins/README.md) |
| for trigger phrase, status and link of all Jenkins jobs. See Apache Beam Developer Guide for |
| [Jenkins Tips](https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips). |
| |
| An additional execution environment for CI is [GitHub Actions](https://github.com/features/actions). GitHub Actions |
| (GA) are very well integrated with GitHub code and Workflow and it has evolved fast in 2019/2020 to become |
| a fully-fledged CI environment, easy to use and develop for, so we decided to use it for building python source |
| distribution and wheels. |
| |
| ## GitHub Actions |
| |
| ### GitHub actions run types |
| |
| The following GA CI Job runs are currently run for Apache Beam, and each of the runs have different |
| purpose and context. |
| |
| #### Pull request run |
| |
| Those runs are results of PR from the forks made by contributors. Most builds for Apache Beam fall |
| into this category. They are executed in the context of the "Fork", not main |
| Beam Code Repository which means that they have only "read" permission to all the GitHub resources |
| (container registry, code repository). This is necessary as the code in those PRs (including CI job |
| definition) might be modified by people who are not committers for the Apache Beam Code Repository. |
| |
| The main purpose of those jobs is to check if PR builds cleanly, if the test run properly and if |
| the PR is ready to review and merge. |
| |
| #### Direct Push/Merge Run |
| |
| Those runs are results of direct pushes done by the committers or as result of merge of a Pull Request |
| by the committers. Those runs execute in the context of the Apache Beam Code Repository and have also |
| write permission for GitHub resources (container registry, code repository). |
| The main purpose for the run is to check if the code after merge still holds all the assertions - like |
| whether it still builds, all tests are green. |
| |
| This is needed because some of the conflicting changes from multiple PRs might cause build and test failures |
| after merge even if they do not fail in isolation. |
| |
| #### Scheduled runs |
| |
| Those runs are results of (nightly) triggered job - only for `master` branch. The |
| main purpose of the job is to check if there was no impact of external dependency changes on the Apache |
| Beam code (for example transitive dependencies released that fail the build). Another reason for the nightly |
| build is that the builds tags most recent master with `nightly-master`. |
| |
| All runs consist of the same jobs, but the jobs behave slightly differently or they are skipped in different |
| run categories. Here is a summary of the run categories with regards of the jobs they are running. |
| Those jobs often have matrix run strategy which runs several different variations of the jobs |
| (with different platform type / Python version to run for example) |
| |
| ### Google Cloud Platform Credentials |
| |
| Some of the jobs require variables stored as [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) |
| to perform operations on Google Cloud Platform. |
| These variables are: |
| * `GCP_PROJECT_ID` - ID of the Google Cloud project. For example: `apache-beam-testing`. |
| * `GCP_REGION` - Region of the bucket and dataflow jobs. For example: `us-central1`. |
| * `GCP_TESTING_BUCKET` - Name of the bucket where temporary files for Dataflow tests will be stored. For example: `beam-github-actions-tests`. |
| * `GCP_PYTHON_WHEELS_BUCKET` - Name of the bucket where python source distribution and wheels will be stored. For example: `beam-wheels-staging`. |
| * `GCP_SA_EMAIL` - Service account email address. This is usually of the format `<name>@<project-id>.iam.gserviceaccount.com`. |
| * `GCP_SA_KEY` - Service account key. This key should be created and encoded as a Base64 string (eg. `cat my-key.json | base64` on macOS). |
| |
| Service Account shall have following permissions ([IAM roles](https://cloud.google.com/iam/docs/understanding-roles)): |
| * Storage Admin (roles/storage.admin) |
| * Dataflow Admin (roles/dataflow.admin) |
| * Artifact Registry writer (roles/artifactregistry.createOnPush) |
| * Big Query Data Editor (roles/bigquery.dataEditor) |
| * Service Account User (roles/iam.serviceAccountUser) |
| |
| ### Workflows |
| |
| #### Build python source distribution and wheels - [build_wheels.yml](.github/workflows/build_wheels.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Check GCP variables | Checks that GCP variables are set. Jobs which required them depend on the output of this job. | Yes | Yes | Yes | Yes/No | |
| | Build python source distribution | Builds python source distribution and uploads it to artifacts. Artifacts from release branch are used in release process ([`build_release_candidate.sh`](release/src/main/scripts/build_release_candidate.sh)) | Yes | Yes | Yes | - | |
| | Prepare GCS | Clears target path on GCS if already exists. | - | Yes | Yes | Yes | |
| | Upload python source distribution to GCS bucket | Uploads python source distribution to GCS bucket for path unique for specific workflow run. | - | Yes | Yes | Yes | |
| | Build python wheels on linux/macos/windows | Builds python wheels on linux/macos/windows platform with usage of `cibuildwheel` and uploads it to artifacts. Artifacts from release branch are used in release process ( [ `build_release_candidate.sh` ](release/src/main/scripts/build_release_candidate.sh) ) | Yes | Yes | Yes | - | |
| | Upload python wheels to GCS bucket | Uploads python wheels to GCS bucket for path unique for specific workflow run. Additionally uploads workflow run data. | - | Yes | Yes | Yes | |
| | List files on Google Cloud Storage Bucket | Lists files on GCS for verification purpose. | - | Yes | Yes | Yes | |
| | Branch repo nightly | Branch repo with `nightly-master` if build python source distribution and python wheels finished successfully. | - | - | Yes | - | |
| |
| #### Python tests - [python_tests.yml](.github/workflows/python_tests.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |----------------------------------|-----------------------------------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Check GCP variables | Checks that GCP variables are set. Jobs which required them depend on the output of this job. | Yes | Yes | Yes | Yes/No | |
| | Build python source distribution | Builds python source distribution and uploads it to artifacts. Artifacts are used in `Python Wordcount Dataflow` job. | - | Yes | Yes | Yes | |
| | Python Unit Tests | Runs python unit tests. | Yes | Yes | Yes | - | |
| | Python Wordcount Direct Runner | Runs python WordCount example with Direct Runner. | Yes | Yes | Yes | - | |
| | Python Wordcount Dataflow | Runs python WordCount example with DataFlow Runner. | - | Yes | Yes | Yes | |
| |
| #### Java tests - [java_tests.yml](.github/workflows/java_tests.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |------------------------------|-----------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Check GCP variables | Checks that GCP variables are set. Jobs which required them depend on the output of this job. | Yes | Yes | Yes | Yes/No | |
| | Java Unit Tests | Runs Java unit tests. | Yes | Yes | Yes | - | |
| | Java Wordcount Direct Runner | Runs Java WordCount example with Direct Runner. | Yes | Yes | Yes | - | |
| | Java Wordcount Dataflow | Runs Java WordCount example with DataFlow Runner. | - | Yes | Yes | Yes | |
| |
| ### Release Preparation and Validation Workflows |
| |
| #### Start Snapshot Build - [start_snapshot_build.yml](.github/workflows/start_snapshot_build.yml) |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |-----------------------|-------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Start Snapshot Build | Creates PR against apache:master and triggers a job to build a snapshot | No | No | No | No | |
| |
| #### Choose RC Commit - [choose_rc_commit.yml](.github/workflows/choose_rc_commit.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |------------------|-----------------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Choose RC Commit | Chooses a commit to be the basis of a release candidate and pushes a new tagged commit for that RC. | No | No | No | No | |
| |
| #### Cut Release Branch - [verify_release_build.yml](.github/workflows/cut_release_branch.yml) |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |-----------------------|------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Update Master | Update Apache Beam master branch with next release version | No | No | No | No | |
| | Update Release Branch | Cut release branch for current development version | No | No | No | No | |
| |
| #### Verify Release Build - [verify_release_build.yml](.github/workflows/verify_release_build.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |------------------------------|-----------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Verify Release Build | Verifies full life cycle of Gradle Build and all PostCommit/PreCommit tests against Release Branch on CI. | No | No | No | No | |
| |
| #### Git tag Release Version - [git_tag_released_version.yml](.github/workflows/git_tag_released_version.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |---------------------------------|----------------------------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Git Tag Release Version | Create and push a new tag for the released version by copying the tag for the final release candidate. | No | No | No | No | |
| |
| #### Run RC Validation - [run_rc_validation.yml](.github/workflows/run_rc_validation.yml) |
| |
| | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |
| |------------------------------|-----------------------------------------------------------------------------------------------|------------------|-----------------------|---------------|--------------------------| |
| | Python Release Candidate | Comment on PR to trigger Python ReleaseCandidate Jenkins job. | No | No | No | No | |
| | Python XLang SQL Taxi | Runs Python XLang SQL Taxi with DataflowRunner | No | No | No | Yes | |
| | Python XLang Kafka | Runs Python XLang Kafka Taxi with DataflowRunner | No | No | No | Yes | |
| | Direct Runner Leaderboard | Runs Python Leaderboard with DirectRunner | No | No | No | Yes | |
| | Direct Runner GameStats | Runs Python GameStats with DirectRunner. | No | No | No | Yes | |
| | Dataflow Runner Leaderboard | Runs Python Leaderboard with DataflowRunner | No | No | No | Yes | |
| | Dataflow Runner GameStats | Runs Python GameStats with DataflowRunner | No | No | No | Yes | |
| |
| ### All migrated workflows run based on the following triggers |
| |
| | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Workflow Dispatch | |
| |-------------|------------------|-----------------------|---------------|-------------------| |
| | PostCommit | No | Yes | Yes | Yes | |
| | PreCommit | Yes | Yes | Yes | Yes | |
| |
| ### PreCommit Workflows |
| |
| | Workflow | Description | Requires GCP Credentials | |
| |----------------------------------------------------------------------------------|-------------------------|---------------------------| |
| | [job-precommit-placeholder.yml](.github/workflows/job-precommit-placeholder.yml) | Description placeholder | Yes/No | |
| |
| ### PostCommit Workflows |
| |
| | Workflow | Description | Requires GCP Credentials | |
| |------------------------------------------------------------------------------------|-------------------------|--------------------------| |
| | [job-postcommit-placeholder.yml](.github/workflows/job-postcommit-placeholder.yml) | Description placeholder | Yes/No | |
| |
| ### GitHub Action Tips |
| |
| * All migrated workflows get executed on **pre-configured self-hosted** runners. For this reason, GCP credentials are **only** needed when running the workflows in a different runner. |
| * If you introduce changes to the workflow it is possible that your changes will not be present in the check run triggered in Pull Request. |
| In this case please attach link to the modified workflow run executed on your fork. |
| * Possible timeouts with macOS runner - existing issue: [(X) This check failed - sometimes happens on macOS runner #841](https://github.com/actions/virtual-environments/issues/841) |
| * [GitHub Actions Documentation](https://docs.github.com/en/actions) |