tree: 56252f2700c3be4120dae45d97dc181fe29c38f4 [path history] [tgz]
  1. dockerfiles/
  2. files/
  3. templates/
  4. tests/
  5. .gitignore
  6. .helmignore
  7. Chart.yaml
  8. README.md
  9. requirements.lock
  10. requirements.yaml
  11. values.schema.json
  12. values.yaml
chart/README.md

Helm Chart for Apache Airflow

Apache Airflow is a platform to programmatically author, schedule and monitor workflows.

Introduction

This chart will bootstrap an Airflow deployment on a Kubernetes cluster using the Helm package manager.

Prerequisites

  • Kubernetes 1.12+ cluster
  • Helm 2.11+ or Helm 3.0+
  • PV provisioner support in the underlying infrastructure

Installing the Chart

To install this repository from source (using helm 3)

kubectl create namespace airflow
helm repo add stable https://charts.helm.sh/stable/
helm dep update
helm install airflow . --namespace airflow

The command deploys Airflow on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.

Tip: List all releases using helm list

Upgrading the Chart

To upgrade the chart with the release name airflow:

helm upgrade airflow . --namespace airflow

Uninstalling the Chart

To uninstall/delete the airflow deployment:

helm delete airflow --namespace airflow

The command removes all the Kubernetes components associated with the chart and deletes the release.

Updating DAGs

The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (docker build -t my-company/airflow:8a0da78 .), push it to an accessible registry (docker push my-company/airflow:8a0da78), then update the Airflow pods with that image:

helm upgrade airflow . \
  --set images.airflow.repository=my-company/airflow \
  --set images.airflow.tag=8a0da78

For local development purpose you can also build the image locally and use it via deployment method described by Breeze.

Mounting DAGS using Git-Sync side car with Persistence enabled

This option will use a Persistent Volume Claim with an accessMode of ReadWriteMany. The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. The other pods will read the synced DAGs. Not all volume plugins have support for ReadWriteMany accessMode. Refer Persistent Volume Access Modes for details

helm upgrade airflow . \
  --set dags.persistence.enabled=true \
  --set dags.gitSync.enabled=true
  # you can also override the other persistence or gitSync values
  # by setting the  dags.persistence.* and dags.gitSync.* values
  # Please refer to values.yaml for details

Mounting DAGS using Git-Sync side car without Persistence

This option will use an always running Git-Sync side car on every scheduler,webserver and worker pods. The Git-Sync side car containers will sync DAGs from a git repository every configured number of seconds. If you are using the KubernetesExecutor, Git-sync will run as an initContainer on your worker pods.

helm upgrade airflow . \
  --set dags.persistence.enabled=false \
  --set dags.gitSync.enabled=true
  # you can also override the other gitSync values
  # by setting the  dags.gitSync.* values
  # Refer values.yaml for details

Mounting DAGS from an externally populated PVC

In this approach, Airflow will read the DAGs from a PVC which has ReadOnlyMany or ReadWriteMany accessMode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart

helm upgrade airflow . \
  --set dags.persistence.enabled=true \
  --set dags.persistence.existingClaim=my-volume-claim
  --set dags.gitSync.enabled=false

Parameters

The following tables lists the configurable parameters of the Airflow chart and their default values.

ParameterDescriptionDefault
uidUID to run airflow pods under50000
gidGID to run airflow pods under50000
nodeSelectorNode labels for pod assignment{}
affinityAffinity labels for pod assignment{}
tolerationsToleration labels for pod assignment[]
labelsCommon labels to add to all objects defined in this chart{}
privateRegistry.enabledEnable usage of a private registry for Airflow base imagefalse
privateRegistry.repositoryRepository where base image lives (eg: quay.io)~
ingress.enabledEnable Kubernetes Ingress supportfalse
ingress.web.*Configs for the Ingress of the web ServicePlease refer to values.yaml
ingress.flower.*Configs for the Ingress of the flower ServicePlease refer to values.yaml
networkPolicies.enabledEnable Network Policies to restrict traffictrue
airflowHomeLocation of airflow home directory/opt/airflow
rbacEnabledDeploy pods with Kubernetes RBAC enabledtrue
executorAirflow executor (eg SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor)KubernetesExecutor
allowPodLaunchingAllow airflow pods to talk to Kubernetes API to launch more podstrue
defaultAirflowRepositoryFallback docker repository to pull airflow image fromapache/airflow
defaultAirflowTagFallback docker image tag to deploy1.10.10.1-alpha2-python3.6
images.airflow.repositoryDocker repository to pull image from. Update this to deploy a custom image~
images.airflow.tagDocker image tag to pull image from. Update this to deploy a new custom image tag~
images.airflow.pullPolicyPullPolicy for airflow imageIfNotPresent
images.flower.repositoryDocker repository to pull image from. Update this to deploy a custom image~
images.flower.tagDocker image tag to pull image from. Update this to deploy a new custom image tag~
images.flower.pullPolicyPullPolicy for flower imageIfNotPresent
images.statsd.repositoryDocker repository to pull image from. Update this to deploy a custom imageapache/airflow
images.statsd.tagDocker image tag to pull image from. Update this to deploy a new custom image tagairflow-statsd-exporter-2020.09.05-v0.17.0
images.statsd.pullPolicyPullPolicy for statsd-exporter imageIfNotPresent
images.redis.repositoryDocker repository to pull image from. Update this to deploy a custom imageredis
images.redis.tagDocker image tag to pull image from. Update this to deploy a new custom image tag6-buster
images.redis.pullPolicyPullPolicy for redis imageIfNotPresent
images.pgbouncer.repositoryDocker repository to pull image from. Update this to deploy a custom imageapache/airflow
images.pgbouncer.tagDocker image tag to pull image from. Update this to deploy a new custom image tagairflow-pgbouncer-2020.09.05-1.14.0
images.pgbouncer.pullPolicyPullPolicy for pgbouncer imageIfNotPresent
images.pgbouncerExporter.repositoryDocker repository to pull image from. Update this to deploy a custom imageapache/airflow
images.pgbouncerExporter.tagDocker image tag to pull image from. Update this to deploy a new custom image tagairflow-pgbouncer-exporter-2020.09.25-0.5.0
images.pgbouncerExporter.pullPolicyPullPolicy for pgbouncer-exporter imageIfNotPresent
envEnvironment variables key/values to mount into Airflow pods (deprecated, prefer using extraEnv)[]
secretSecret name/key pairs to mount into Airflow pods[]
extraEnvExtra env ‘items’ that will be added to the definition of airflow containers~
extraEnvFromExtra envFrom ‘items’ that will be added to the definition of airflow containers~
extraSecretsExtra Secrets that will be managed by the chart{}
extraConfigMapsExtra ConfigMaps that will be managed by the chart{}
data.metadataSecretNameSecret name to mount Airflow connection string from~
data.resultBackendSecretNameSecret name to mount Celery result backend connection string from~
data.metadataConectionField separated connection data (alternative to secret name){}
data.resultBackendConnectionField separated connection data (alternative to secret name){}
fernetKeyString representing an Airflow Fernet key~
fernetKeySecretNameSecret name for Airflow Fernet key~
kerberos.enabledEnable kerberos support for workersfalse
kerberos.ccacheMountPathLocation of the ccache volume/var/kerberos-ccache
kerberos.ccacheFileNameName of the ccache fileccache
kerberos.configPathPath for the Kerberos config file/etc/krb5.conf
kerberos.keytabPathPath for the Kerberos keytab file/etc/airflow.keytab
kerberos.principalName of the Kerberos principalairflow
kerberos.reinitFrequencyFrequency of reinitialization of the Kerberos token3600
kerberos.configContent of the configuration file for kerberos (might be templated using Helm templates)<see values.yaml>
workers.replicasReplica count for Celery workers (if applicable)1
workers.keda.enabledEnable KEDA autoscaling featuresfalse
workers.keda.pollingInvervalHow often KEDA should poll the backend database for metrics in seconds5
workers.keda.cooldownPeriodHow often KEDA should wait before scaling down in seconds30
workers.keda.maxReplicaCountMaximum number of Celery workers KEDA can scale to10
workers.kerberosSideCar.enabledEnable Kerberos sidecar for the workerfalse
workers.persistence.enabledEnable log persistence in workers via StatefulSetfalse
workers.persistence.sizeSize of worker volumes if enabled100Gi
workers.persistence.storageClassNameStorageClass worker volumes should use if enableddefault
workers.resources.limits.cpuCPU Limit of workers~
workers.resources.limits.memoryMemory Limit of workers~
workers.resources.requests.cpuCPU Request of workers~
workers.resources.requests.memoryMemory Request of workers~
workers.terminationGracePeriodSecondsHow long Kubernetes should wait for Celery workers to gracefully drain before force killing600
workers.safeToEvictAllow Kubernetes to evict worker pods if needed (node downscaling)true
workers.serviceAccountAnnotationsAnnotations to add to worker kubernetes service account{}
workers.extraVolumesMount additional volumes into worker[]
workers.extraVolumeMountsMount additional volumes into worker[]
scheduler.podDisruptionBudget.enabledEnable PDB on Airflow schedulerfalse
scheduler.podDisruptionBudget.config.maxUnavailableMaxUnavailable pods for scheduler1
scheduler.replicas# of parallel schedulers (Airflow 2.0 using Mysql 8+ or Postgres only)1
scheduler.resources.limits.cpuCPU Limit of scheduler~
scheduler.resources.limits.memoryMemory Limit of scheduler~
scheduler.resources.requests.cpuCPU Request of scheduler~
scheduler.resources.requests.memoryMemory Request of scheduler~
scheduler.airflowLocalSettingsCustom Airflow local settings python file~
scheduler.safeToEvictAllow Kubernetes to evict scheduler pods if needed (node downscaling)true
scheduler.serviceAccountAnnotationsAnnotations to add to scheduler kubernetes service account{}
scheduler.extraVolumesMount additional volumes into scheduler[]
scheduler.extraVolumeMountsMount additional volumes into scheduler[]
webserver.livenessProbe.initialDelaySecondsWebserver LivenessProbe initial delay15
webserver.livenessProbe.timeoutSecondsWebserver LivenessProbe timeout seconds30
webserver.livenessProbe.failureThresholdWebserver LivenessProbe failure threshold20
webserver.livenessProbe.periodSecondsWebserver LivenessProbe period seconds5
webserver.readinessProbe.initialDelaySecondsWebserver ReadinessProbe initial delay15
webserver.readinessProbe.timeoutSecondsWebserver ReadinessProbe timeout seconds30
webserver.readinessProbe.failureThresholdWebserver ReadinessProbe failure threshold20
webserver.readinessProbe.periodSecondsWebserver ReadinessProbe period seconds5
webserver.replicasHow many Airflow webserver replicas should run1
webserver.resources.limits.cpuCPU Limit of webserver~
webserver.resources.limits.memoryMemory Limit of webserver~
webserver.resources.requests.cpuCPU Request of webserver~
webserver.resources.requests.memoryMemory Request of webserver~
webserver.service.annotationsAnnotations to be added to the webserver service{}
webserver.defaultUserOptional default airflow user information{}
dags.persistence.*Dag persistence configurationPlease refer to values.yaml
dags.gitSync.*Git sync configurationPlease refer to values.yaml
multiNamespaceModeWhether the KubernetesExecutor can launch pods in multiple namespacesFalse
serviceAccountAnnottions.*Map of annotations for worker, webserver, scheduler kubernetes service accounts{}

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,

helm install --name my-release \
  --set executor=CeleryExecutor \
  --set enablePodLaunching=false .

Autoscaling with KEDA

KEDA stands for Kubernetes Event Driven Autoscaling. KEDA is a custom controller that allows users to create custom bindings to the Kubernetes Horizontal Pod Autoscaler. We've built an experimental scaler that allows users to create scalers based on postgreSQL queries. For the moment this exists on a separate branch, but will be merged upstream soon. To install our custom version of KEDA on your cluster, please run

helm repo add kedacore https://kedacore.github.io/charts

helm repo update

helm install \
    --set image.keda=docker.io/kedacore/keda:1.2.0 \
    --set image.metricsAdapter=docker.io/kedacore/keda-metrics-adapter:1.2.0 \
    --namespace keda --name keda kedacore/keda

Once KEDA is installed (which should be pretty quick since there is only one pod). You can try out KEDA autoscaling on this chart by setting workers.keda.enabled=true your helm command or in the values.yaml. (Note: KEDA does not support StatefulSets so you need to set worker.persistence.enabled to false)

kubectl create namespace airflow

helm install airflow . \
    --namespace airflow \
    --set executor=CeleryExecutor \
    --set workers.keda.enabled=true \
    --set workers.persistence.enabled=false

Walkthrough using kind

Install kind, and create a cluster:

We recommend testing with Kubernetes 1.15, as this image doesn't support Kubernetes 1.16+ for CeleryExecutor presently.

kind create cluster \
  --image kindest/node:v1.15.7@sha256:e2df133f80ef633c53c0200114fce2ed5e1f6947477dbc83261a6a921169488d

Confirm it's up:

kubectl cluster-info --context kind-kind

Add Astronomer's Helm repo:

helm repo add astronomer https://helm.astronomer.io
helm repo update

Create namespace + install the chart:

kubectl create namespace airflow
helm install airflow --n airflow astronomer/airflow

It may take a few minutes. Confirm the pods are up:

kubectl get pods --all-namespaces
helm list -n airflow

Run kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow to port-forward the Airflow UI to http://localhost:8080/ to confirm Airflow is working.

Build a Docker image from your DAGs:

  1. Start a project using astro-cli, which will generate a Dockerfile, and load your DAGs in. You can test locally before pushing to kind with astro airflow start.

    mkdir my-airflow-project && cd my-airflow-project
    astro dev init
    
  2. Then build the image:

    docker build -t my-dags:0.0.1 .
    
  3. Load the image into kind:

    kind load docker-image my-dags:0.0.1
    
  4. Upgrade Helm deployment:

    helm upgrade airflow -n airflow \
        --set images.airflow.repository=my-dags \
        --set images.airflow.tag=0.0.1 \
        astronomer/airflow
    

Contributing

Check out our contributing guide!