blob: 37ec9034553e5ba91adcef3524676fc32af7b981 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Manage DAGs files
=================
When you create new or modify existing DAG files, it is necessary to deploy them into the environment. This section will describe some basic techniques you can use.
Bake DAGs in Docker image
-------------------------
The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code:
.. code-block:: bash
docker build --tag "my-company/airflow:8a0da78" . -f - <<EOF
FROM apache/airflow
COPY ./dags/ \${AIRFLOW_HOME}/dags/
EOF
.. note::
In airflow images prior to version 2.0.2, there was a bug that required you to use
a bit longer Dockerfile, to make sure the image remains OpenShift-compatible (i.e dag
has root group similarly as other files). In 2.0.2 this has been fixed.
.. code-block:: bash
docker build --tag "my-company/airflow:8a0da78" . -f - <<EOF
FROM apache/airflow:2.0.2
USER root
COPY --chown=airflow:root ./dags/ \${AIRFLOW_HOME}/dags/
USER airflow
EOF
Then publish it in the accessible registry:
.. code-block:: bash
docker push my-company/airflow:8a0da78
Finally, update the Airflow pods with that image:
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set images.airflow.repository=my-company/airflow \
--set images.airflow.tag=8a0da78
If you are deploying an image with a constant tag, you need to make sure that the image is pulled every time.
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set images.airflow.repository=my-company/airflow \
--set images.airflow.tag=8a0da78 \
--set images.airflow.pullPolicy=Always
Mounting DAGs using Git-Sync sidecar with Persistence enabled
-------------------------------------------------------------
This option will use a Persistent Volume Claim with an access mode of ``ReadWriteMany``.
The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of
seconds. The other pods will read the synced DAGs. Not all volume plugins have support for
``ReadWriteMany`` access mode.
Refer `Persistent Volume Access Modes <https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes>`__
for details.
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set dags.persistence.enabled=true \
--set dags.gitSync.enabled=true
# you can also override the other persistence or gitSync values
# by setting the dags.persistence.* and dags.gitSync.* values
# Please refer to values.yaml for details
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set dags.persistence.enabled=true \
--set dags.gitSync.enabled=true \
# you can also override the other persistence or gitSync values
# by setting the dags.persistence.* and dags.gitSync.* values
# Please refer to values.yaml for details
Mounting DAGs using Git-Sync sidecar without Persistence
--------------------------------------------------------
This option will use an always running Git-Sync sidecar on every scheduler, webserver (if ``airflowVersion < 2.0.0``)
and worker pods.
The Git-Sync sidecar containers will sync DAGs from a git repository every configured number of
seconds. If you are using the ``KubernetesExecutor``, Git-sync will run as an init container on your worker pods.
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set dags.persistence.enabled=false \
--set dags.gitSync.enabled=true
# you can also override the other gitSync values
# by setting the dags.gitSync.* values
# Refer values.yaml for details
When using ``apache-airflow>=2.0.0``, :ref:`DAG Serialization <apache-airflow:dag-serialization>` is enabled by default,
hence Webserver does not need access to DAG files, so ``git-sync`` sidecar is not run on Webserver.
Mounting DAGs from an externally populated PVC
----------------------------------------------
In this approach, Airflow will read the DAGs from a PVC which has ``ReadOnlyMany`` or ``ReadWriteMany`` access mode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow \
--set dags.persistence.enabled=true \
--set dags.persistence.existingClaim=my-volume-claim
--set dags.gitSync.enabled=false
Mounting DAGs from a private Github repo using Git-Sync sidecar
---------------------------------------------------------------
Create a private repo on Github if you have not created one already.
Then create your ssh keys:
.. code-block:: bash
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
Add the public key to your private repo (under ``Settings > Deploy keys``).
You have to convert the private ssh key to a base64 string. You can convert the private ssh key file like so:
.. code-block:: bash
base64 <my-private-ssh-key> -w 0 > temp.txt
Then copy the string from the ``temp.txt`` file. You'll add it to your ``override-values.yaml`` next.
In this example, you will create a yaml file called ``override-values.yaml`` to override values in the
``values.yaml`` file, instead of using ``--set``:
.. code-block:: yaml
dags:
gitSync:
enabled: true
repo: ssh://git@github.com/<username>/<private-repo-name>.git
branch: <branch-name>
subPath: ""
sshKeySecret: airflow-ssh-secret
extraSecrets:
airflow-ssh-secret:
data: |
gitSshKey: '<base64-converted-ssh-private-key>'
Don't forget to copy in your private key base64 string.
Finally, from the context of your Airflow Helm chart directory, you can install Airflow:
.. code-block:: bash
helm upgrade --install airflow apache-airflow/airflow -f override-values.yaml
If you have done everything correctly, Git-Sync will pick up the changes you make to the DAGs
in your private Github repo.
You should take this a step further and set ``dags.gitSync.knownHosts`` so you are not susceptible to man-in-the-middle
attacks. This process is documented in the :ref:`production guide <production-guide:knownhosts>`.