Contributions are welcome and are greatly appreciated! Every little bit helps, and credit will always be given.
Table of Contents generated with DocToc
Report bugs through Apache Jira
Please report relevant information and preferably code that exhibits the problem.
Look through the Jira issues for bugs. Anything is open to whoever wants to implement it.
Look through the Apache Jira for features. Any unassigned “Improvement” issue is open to whoever wants to implement it.
We've created the operators, hooks, macros and executors we needed, but we made sure that this part of Airflow is extensible. New operators, hooks, macros and executors are very welcomed!
Airflow could always use better documentation, whether as part of the official Airflow docs, in docstrings, docs/*.rst
or even on the web as blog posts or articles.
The best way to send feedback is to open an issue on Apache Jira
If you are proposing a feature:
The latest API documentation is usually available here. To generate a local version, you need to have set up an Airflow development environment (see below). Also install the doc
extra.
pip install -e '.[doc]'
Generate and serve the documentation by running:
cd docs ./build.sh ./start_doc_server.sh
When you develop Airflow you can create local virtualenv with all requirements required by Airflow.
Advantage of local installation is that everything works locally, you do not have to enter Docker/container environment and you can easily debug the code locally. You can also have access to python virtualenv that contains all the necessary requirements and use it in your local IDE - this aids autocompletion, and running tests directly from within the IDE.
The disadvantage is that you have to maintain your dependencies and local environment consistent with other development environments that you have on your local machine.
Another disadvantage is that you you cannot run tests that require external components - mysql, postgres database, hadoop, mongo, cassandra, redis etc.. The tests in Airflow are a mixture of unit and integration tests and some of them require those components to be setup. Only real unit tests can be run bu default in local environment.
If you want to run integration tests, you need to configure and install the dependencies on your own.
It‘s also very difficult to make sure that your local environment is consistent with other environments. This can often lead to “works for me” syndrome. It’s better to use the Docker Compose integration test environment in case you want reproducible environment consistent with other people.
Install Python (3.5 or 3.6), MySQL, and libxml by using system-level package managers like yum, apt-get for Linux, or Homebrew for Mac OS at first. Refer to the Dockerfile for a comprehensive list of required packages.
In order to use your IDE you need you can use the virtual environment. Ideally you should setup virtualenv for all python versions that Airflow supports (2.7, 3.5, 3.6). An easy way to create the virtualenv is to use virtualenvwrapper - it allows you to easily switch between virtualenvs using workon
command and mange your virtual environments more easily. Typically creating the environment can be done by:
mkvirtualenv <ENV_NAME> --python=python<VERSION>
Then you need to install python PIP requirements. Typically it can be done with: pip install -e ".[devel]"
. Then you need to run airflow db init
to create sqlite database.
Once initialization is done, you should select the virtualenv you initialized as the project's default virtualenv in your IDE and run tests efficiently.
After setting it up - you can use the usual “Run Test” option of the IDE and have the autocomplete and documentation support from IDE as well as you can debug and view the sources of Airflow - which is very helpful during development.
Once you activate virtualenv (or enter docker container) as described below you should be able to run run-tests
at will (it is in the path in Docker environment but you need to prepend it with ./
in local virtualenv (./run-tests
).
Note that this script has several flags that can be useful for your testing.
Usage: run-tests [FLAGS] [TESTS_TO_RUN] -- <EXTRA_NOSETEST_ARGS> Runs tests specified (or all tests if no tests are specified) Flags: -h, --help Shows this help message. -i, --with-db-init Forces database initialization before tests -s, --nocapture Don't capture stdout when running the tests. This is useful if you are debugging with ipdb and want to drop into console with it by adding this line to source code: import ipdb; ipdb.set_trace() -v, --verbose Verbose output showing coloured output of tests being run and summary of the tests - in a manner similar to the tests run in the CI environment.
You can pass extra parameters to nose, by adding nose arguments after --
For example, in order to just execute the “core” unit tests and add ipdb set_trace method, you can run the following command:
./run-tests tests.core:CoreTest --nocapture --verbose
or a single test method without colors or debug logs:
./run-tests tests.core:CoreTest.test_check_operators
Note that ./run_tests
script runs tests but the first time it runs, it performs database initialisation. If you run further tests without leaving the environment, the database will not be initialized, but you can always force database initialization with --with-db-init
(-i
) switch. The scripts will inform you what you can do when they are run.
Once you configure your tests to use the virtualenv you created. running tests from IDE is as simple as:
Note that while most of the tests are typical “unit” tests that do not require external components, there are a number of tests that are more of “integration” or even “system” tests (depending on the convention you use). Those tests interact with external components. For those tests you need to run complete Docker Compose - base environment below.
This is the environment that is used during CI builds on Travis CI. We have scripts to reproduce the Travis environment and you can enter the environment and run it locally.
The scripts used by Travis CI run also image builds which make the images contain all the sources. You can see which scripts are used in .travis.yml file.
Docker
You need to have Docker CE installed.
IMPORTANT!!! : Mac OS Docker default Disk size settings
When you develop on Mac OS you usually have not enough disk space for Docker if you start using it seriously. You should increase disk space available before starting to work with the environment. Usually you have weird stops of docker containers when you run out of Disk space. It might not be obvious that space is an issue. If you get into weird behaviour try Cleaning Up Docker
See Docker for Mac - Space for details of increasing disk space available for Docker on Mac.
At least 128 GB of Disk space is recommended. You can also get by with smaller space but you should more often clean the docker disk space periodically.
Getopt and coreutils
If you are on MacOS:
brew install gnu-getopt coreutils
(if you use brew, or use equivalent command for ports)If you use bash, you should run this command:
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile . ~/.bash_profile
If you use zsh, you should run this command:
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile . ~/.zprofile If you use zsh, you should run this command: ```bash echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile . ~/.zprofile
If you are on Linux:
apt install util-linux coreutils
or equivalent if your system is not Debian-based.Default environment settings (python 3.6, sqlite backend, docker environment)
./scripts/ci/local_ci_enter_environment.sh
Overriding default environment settings:
PYTHON_VERSION=3.5 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_enter_environment.sh
Once you are inside the environment you can run individual tests as described in Running individual tests.
We have a number of static code checks that are run in Travis CI but you can run them locally as well. All the scripts are available in scripts/ci folder.
All these tests run in python3.6 environment. Note that the first time you run the checks it might take some time to rebuild the docker images required to run the tests, but all subsequent runs will be much faster - the build phase will just check if your code has changed and rebuild as needed.
The checks below are run in a docker environment, which means that if you run them locally, they should give the same results as the tests run in TravisCI without special environment preparation.
You can trigger the static checks from the host environment, without entering Docker container. You do that by running appropriate scripts (The same is done in TravisCI)
Those scripts are optimised for time of rebuilds of docker image. The image will be automatically rebuilt when needed (for example when dependencies change).
You can also force rebuilding of the image by deleting .build directory which keeps cached information about the images built.
Documentation after it is built, is available in docs/_build/html folder. This folder is mounted from the host so you can access those files in your host as well.
If you are already in the Docker Compose Environment you can also run the same static checks from within container:
./scripts/ci/in_container/run_mypy.sh airflow tests
./scripts/ci/in_container/run_flake8.sh
./scripts/ci/in_container/run_check_licence.sh
./scripts/ci/in_container/run_docs_build.sh
In all static check scripts - both in container and in the host you can also pass module/file path as parameters of the scripts to only check selected modules or files. For example:
In container:
./scripts/ci/in_container/run_mypy.sh ./airflow/example_dags/
or
./scripts/ci/in_container/run_mypy.sh ./airflow/example_dags/test_utils.py
In host:
./scripts/ci/ci_mypy.sh ./airflow/example_dags/
or
./scripts/ci/ci_mypy.sh ./airflow/example_dags/test_utils.py
And similarly for other scripts.
For all development related tasks related to integration tests and static code checks we are using Docker images that are maintained in Dockerhub under apache/airflow
repository.
There are two images that we currently manage:
When you run tests or enter environment or run local static checks, the first time you do it, the necessary local images will be pulled and built for you automatically from DockerHub. Then the scripts will check automatically if the image needs to be re-built if needed and will do that automatically for you.
Note that building image first time pulls the pre-built version of images from Dockerhub might take a bit of time - but this wait-time will not repeat for any subsequent source code change. However, changes to sensitive files like setup.py or Dockerfile will trigger a rebuild that might take more time (but it is highly optimised to only rebuild what's needed)
You can also Build the images or Force pull and build the images) manually at any time.
See Troubleshooting section for steps you can make to clean the environment.
Once you performed the first build, the images are rebuilt locally rather than pulled - unless you force pull the images. But you can force it using the scripts described below.
For your convenience, there are scripts that can be used in local development
Running all tests with default settings (python 3.6, sqlite backend, docker environment):
./scripts/ci/local_ci_run_airflow_testing.sh
Selecting python version, backend, docker environment:
PYTHON_VERSION=3.5 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_run_airflow_testing.sh
Running kubernetes tests:
KUBERNETES_VERSION==v1.13.0 KUBERNETES_MODE=persistent_mode BACKEND=postgres ENV=kubernetes \ ./scripts/ci/local_ci_run_airflow_testing.sh
The following environments are possible:
local_ci_enter_environment.sh
and run tests manually, you cannot execute local_ci_run_airflow_testing.sh
with it.Note: The Kubernetes environment will require setting up minikube/kubernetes so it might require some host-network configuration.
Docker-compose environment starts a number of docker containers and keep them running. You can tear them down by running /scripts/ci/local_ci_stop_environment.sh
On Linux there is a problem with propagating ownership of created files (known Docker problem). Basically files and directories created in container are not owned by the host user (but by the root user in our case). This might prevent you from switching branches for example if files owned by root user are created within your sources. In case you are on Linux host and haa some files in your sources created by the root user, you can fix the ownership of those files by running scripts/ci/local_ci_fix_ownership.sh script.
You can manually trigger building of the local images using scripts/ci/local_ci_build.sh.
You can also force-pull the images before building them locally so that you are sure that you download latest images from DockerHub repository before building. This can be done with scripts/ci/local_ci_pull_and_build.sh script.
Note that you might need to cleanup your Docker environment occasionally. The images are quite big (1.5GB for both images needed for static code analysis and CI tests). And if you often rebuild/update images you might end up with some unused image data.
Cleanup can be performed with docker system prune
command.
If you run into disk space errors, we recommend you prune your docker images using the docker system prune --all
command. You might need to Stop the environment or restart the docker engine before running this command.
You can check if your docker is clean by running docker images --all
and docker ps --all
- both should return an empty list of images and containers respectively.
If you are on Mac OS and you end up with not enough disk space for Docker you should increase disk space available for Docker. See Docker for Mac - Space for details.
In case you have problems with the Docker Compose environment - try the following (after each step you can check if your problem is fixed)
In case the problems are not solved, you can set VERBOSE variable to “true” (export VERBOSE="true"
) and rerun failing command, and copy & paste the output from your terminal, describe the problem and post it in Airflow Slack #troubleshooting channel.
Another great way of automating linting and testing is to use Git Hooks. For example you could create a pre-commit
file based on the Travis CI Pipeline so that before each commit a local pipeline will be triggered and if this pipeline fails (returns an exit code other than 0
) the commit does not come through. This “in theory” has the advantage that you can not commit any code that fails that again reduces the errors in the Travis CI Pipelines.
Since there are a lot of tests the script would last very long so you probably only should test your new feature locally.
The following example of a pre-commit
file allows you..
#!/bin/sh GREEN='\033[0;32m' NO_COLOR='\033[0m' setup_python_env() { local venv_path=${1} echo -e "${GREEN}Activating python virtual environment ${venv_path}..${NO_COLOR}" source ${venv_path} } run_linting() { local project_dir=$(git rev-parse --show-toplevel) echo -e "${GREEN}Running flake8 over directory ${project_dir}..${NO_COLOR}" flake8 ${project_dir} } run_testing_in_docker() { local feature_path=${1} local airflow_py2_container=${2} local airflow_py3_container=${3} echo -e "${GREEN}Running tests in ${feature_path} in airflow python 2 docker container..${NO_COLOR}" docker exec -i -w /airflow/ ${airflow_py2_container} nosetests -v ${feature_path} echo -e "${GREEN}Running tests in ${feature_path} in airflow python 3 docker container..${NO_COLOR}" docker exec -i -w /airflow/ ${airflow_py3_container} nosetests -v ${feature_path} } set -e # NOTE: Before running this make sure you have set the function arguments correctly. setup_python_env /Users/feluelle/venv/bin/activate run_linting run_testing_in_docker tests/contrib/hooks/test_imap_hook.py dazzling_chatterjee quirky_stallman
For more information on how to run a subset of the tests, take a look at the nosetests docs.
See also the list of test classes and methods in tests/core.py
.
Feel free to customize based on the extras available in setup.py
Before you submit a pull request from your forked repo, check that it meets these guidelines:
We currently rely heavily on Travis CI for running the full Airflow test suite as running all of the tests locally requires significant setup. You can setup Travis CI in your fork of Airflow by following the Travis CI Getting Started guide.
There are two different options available for running Travis CI which are setup as separate components on GitHub:
Once installed, you can configure the Travis CI GitHub App at https://github.com/settings/installations -> Configure Travis CI.
For the Travis CI GitHub App, you can set repository access to either “All repositories” for convenience, or “Only select repositories” and choose <username>/airflow
in the dropdown.
You can access Travis CI for your fork at https://travis-ci.com/<username>/airflow
.
The Travis CI GitHub Services versions uses an Authorized OAuth App. Note that apache/airflow
is currently still using the legacy version.
Once installed, you can configure the Travis CI Authorized OAuth App at https://github.com/settings/connections/applications/88c5b97de2dbfc50f3ac.
If you are a GitHub admin, click the “Grant” button next to your organization; otherwise, click the “Request” button.
For the Travis CI Authorized OAuth App, you may have to grant access to the forked <organization>/airflow
repo even though it is public.
You can access Travis CI for your fork at https://travis-ci.org/<organization>/airflow
.
The travis-ci.org site for open source projects is now legacy and new projects should instead be created on travis-ci.com for both private repos and open source.
Note that there is a second Authorized OAuth App available called “Travis CI for Open Source” used for the legacy travis-ci.org service. It should not be used for new projects.
More information:
When developing features the need may arise to persist information to the the metadata database. Airflow has Alembic built-in to handle all schema changes. Alembic must be installed on your development machine before continuing.
# starting at the root of the project $ pwd ~/airflow # change to the airflow directory $ cd airflow $ alembic revision -m "add new field to db" Generating ~/airflow/airflow/migrations/versions/12341123_add_new_field_to_db.py
airflow/www/
contains all npm-managed, front end assets. Flask-Appbuilder itself comes bundled with jQuery and bootstrap. While these may be phased out over time, these packages are currently not managed with npm.
Make sure you are using recent versions of node and npm. No problems have been found with node>=8.11.3 and npm>=6.1.3
First, npm must be available in your environment. If you are on Mac and it is not installed, you can run the following commands (taken from this source):
brew install node --without-npm echo prefix=~/.npm-packages >> ~/.npmrc curl -L https://www.npmjs.com/install.sh | sh
The final step is to add ~/.npm-packages/bin
to your PATH
so commands you install globally are usable. Add something like this to your .bashrc
file, then source ~/.bashrc
to reflect the change.
export PATH="$HOME/.npm-packages/bin:$PATH"
You can also follow the general npm installation instructions.
To install third party libraries defined in package.json
, run the following within the airflow/www/
directory which will install them in a new node_modules/
folder within www/
.
# from the root of the repository, move to where our JS package.json lives cd airflow/www/ # run npm install to fetch all the dependencies npm install
To parse and generate bundled files for airflow, run either of the following commands. The dev
flag will keep the npm script running and re-run it upon any changes within the assets directory.
# Compiles the production / optimized js & css npm run prod # Start a web server that manages and updates your assets as you modify them npm run dev
Should you add or upgrade an npm package, which involves changing package.json
, you'll need to re-run npm install
and push the newly generated package-lock.json
file so we get the reproducible build.
We try to enforce a more consistent style and try to follow the JS community guidelines. Once you add or modify any javascript code in the project, please make sure it follows the guidelines defined in Airbnb JavaScript Style Guide. Apache Airflow uses ESLint as a tool for identifying and reporting on patterns in JavaScript, which can be used by running any of the following commands.
# Check JS code in .js and .html files, and report any errors/warnings npm run lint # Check JS code in .js and .html files, report any errors/warnings and fix them if possible npm run lint:fix