The current GitHub Actions workflows are being tested on multiple operating systems, such as Ubuntu, Windows and MacOS. The way to migrate these runners from GitHub to GCP is by implementing self-hosted runners, so we implemented them in both Ubuntu and Windows environments, going with Google Kubernetes Engine and Google Cloud Compute VMs instances respectively.
On the other hand, we will rely on GitHub-hosted runners for MacOS builds until a straightforward implementation approach comes out.
Ubuntu Self-hosted runners are stored in Artifact Registry and implemented using Google Kubernetes Engine with the following specifications:
docker:20.10.17-dind
Windows Virtual machines have the following specifications
At first glance we considered implementing Windows runners using K8s, however this was not optimal because of the following reasons:
In order to monitor the Self-hosted Runners status, we have implemented a separate GitHub Actions workflow using GitHub-hosted runners, this workflow periodically calls a Cloud Function that serves data regarding the number of active
and offline
runners. In case of failure this workflow will send an email alert to the dev distribution email dev@beam.apache.org
.
The Cloud Function uses the endpoints provided by the installed GitHub App to retrieve information about the runners.
Depending on the termination event, sometimes the removal script for offline runners is not triggered correctly from inside the VMs or K8s pod, because of that an additional pipeline was created in order to clean up the list of GitHub runners in the group.
This was implemented using a GCP Cloud Function [code] subscribed to a Pub/Sub topic, the topic is triggered through a Cloud Scheduler that is executed once per day, the function consumes a GitHub API to delete offline self-hosted runners from the organization retrieving the token with its service account to secrets manager.