blob: 9e1a9e0bf3355be12f29ca09381377423e7a527a [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Timetables
==========
For DAGs with time-based schedules (as opposed to event-driven), the scheduling
decisions are driven by its internal "timetable". The timetable also
determines the data interval and the logical date of each run created for the DAG.
DAGs scheduled with a cron expression or ``timedelta`` object are
internally converted to always use a timetable.
If a cron expression or ``timedelta`` is sufficient for your use case, you don't need
to worry about timetables. But for more complicated scheduling requirements,
you may create your own timetable class and pass that to the DAG's ``schedule`` argument instead.
Some examples:
* Data intervals with "holes" between. (Instead of continuous, as both the cron
expression and ``timedelta`` schedules represent.)
* Run tasks at different times each day. For example, an astronomer may find it
useful to run a task at dawn to process data collected from the previous
night-time period.
* Schedules not following the Gregorian calendar. For example, create a run for
each month in the `Traditional Chinese Calendar`_. This is conceptually
similar to the sunset case above, but for a different time scale.
* Rolling windows, or overlapping data intervals. For example, one may want to
have a run each day, but make each run cover the period of the previous seven
days. It is possible to "hack" this with a cron expression, but a custom data
interval would be a more natural representation.
.. _`Traditional Chinese Calendar`: https://en.wikipedia.org/wiki/Chinese_calendar
As such, Airflow allows for custom timetables to be written in plugins and used by
DAGs. An example demonstrating a custom timetable can be found in the
:doc:`/howto/timetable` how-to guide.
Built-in Timetables
-------------------
Airflow comes with several common timetables built in to cover the most common use cases. Additional timetables
may be available in plugins.
.. _CronTriggerTimetable:
CronTriggerTimetable
^^^^^^^^^^^^^^^^^^^^
A timetable that accepts a cron expression, and triggers DAG runs according to it.
.. seealso:: `Differences between the two cron timetables`_
.. code-block:: python
from airflow.timetables.trigger import CronTriggerTimetable
@dag(
timetable=CronTriggerTimetable('0 1 * * 3', timezone='UTC'), # At 01:00 on Wednesday
)
def example_dag():
pass
It is also possible to provide a static data interval to the timetable. The optional ``interval`` argument
must be a :class:`datetime.timedelta` or ``dateutil.relativedelta.relativedelta``. If given, a triggered DAG
run's data interval would span the specified duration, and *ends* with the trigger time.
.. code-block:: python
from datetime import timedelta
from airflow.timetables.trigger import CronTriggerTimetable
@dag(
# Runs every Friday at 18:00 to cover the work week (9:00 Monday to 18:00 Friday).
timetable=CronTriggerTimetable(
"0 18 * * 5",
timezone="UTC",
interval=timedelta(days=4, hours=9),
),
)
def example_dag():
pass
.. _DeltaDataIntervalTimetable:
DeltaDataIntervalTimetable
^^^^^^^^^^^^^^^^^^^^^^^^^^
Schedules data intervals with a time delta. Can be selected by providing a
:class:`datetime.timedelta` or ``dateutil.relativedelta.relativedelta`` to the ``schedule`` parameter of a DAG.
.. code-block:: python
@dag(schedule=datetime.timedelta(minutes=30))
def example_dag():
pass
.. _CronDataIntervalTimetable:
CronDataIntervalTimetable
^^^^^^^^^^^^^^^^^^^^^^^^^
A timetable that accepts a cron expression, creates data intervals according to the interval between each cron
trigger points, and triggers a DAG run at the end of each data interval.
.. seealso:: `Differences between the two cron timetables`_
This can be selected by providing a string that is a valid cron expression to the ``schedule``
parameter of a DAG as described in the :doc:`/concepts/dags` documentation.
.. code-block:: python
@dag(schedule="0 1 * * 3") # At 01:00 on Wednesday.
def example_dag():
pass
EventsTimetable
^^^^^^^^^^^^^^^
Simply pass a list of ``datetime``\s for the DAG to run after. Useful for timing based on sporting
events, planned communication campaigns, and other schedules that are arbitrary and irregular but predictable.
The list of events must be finite and of reasonable size as it must be loaded every time the DAG is parsed. Optionally,
the ``restrict_to_events`` flag can be used to force manual runs of the DAG to use the time of the most recent (or very
first) event for the data interval, otherwise manual runs will run with a ``data_interval_start`` and
``data_interval_end`` equal to the time at which the manual run was begun. You can also name the set of events using the
``description`` parameter, which will be displayed in the Airflow UI.
.. code-block:: python
from airflow.timetables.events import EventsTimetable
@dag(
timetable=EventsTimetable(
event_dates=[
pendulum.datetime(2022, 4, 5, 8, 27, tz="America/Chicago"),
pendulum.datetime(2022, 4, 17, 8, 27, tz="America/Chicago"),
pendulum.datetime(2022, 4, 22, 20, 50, tz="America/Chicago"),
],
description="My Team's Baseball Games",
restrict_to_events=False,
),
)
def example_dag():
pass
.. _Differences between the two cron timetables:
Differences between the two cron timetables
-------------------------------------------
There are two timetables `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ that accepts a cron expression.
There are some differences between the two:
- `CronTriggerTimetable`_ does not take care of *Data Interval*, while `CronDataIntervalTimetable`_ does.
- The time when a DAG run is triggered by `CronTriggerTimetable`_ is more intuitive and more similar to what people
expect cron to behave than that of `CronDataIntervalTimetable`_ (when ``catchup`` is ``False``).
Whether taking care of *Data Interval*
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`CronTriggerTimetable`_ *does not* care the idea of *data interval*. It means the value of ``data_interval_start``,
``data_interval_end`` and legacy ``execution_date`` are the same - the time when a DAG run is triggered.
On the other hand, `CronDataIntervalTimetable`_ *does* care the idea of *data interval*. It means the value of
``data_interval_start`` and ``data_interval_end`` (and legacy ``execution_date``) are different. They are the start
and end of the interval respectively.
The time when a DAG run is triggered
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There is no difference between the two when ``catchup`` is ``True``. :ref:`dag-catchup` tells you how DAG runs are
triggered when ``catchup`` is ``True``.
When ``catchup`` is ``False``, there is difference in how a new DAG run is triggered. `CronTriggerTimetable`_ triggers
a new DAG run *after* the current time, while `CronDataIntervalTimetable`_ does *before* the current time (assuming
the value of ``start_date`` is past time).
Here is an example showing how the first DAG run is triggered. Supposes there is a cron expression ``@daily`` or
``0 0 * * *``, which is aimed to run at 12AM every day. If you enable DAGs using the two timetables at 3PM on January
31st, `CronTriggerTimetable`_ will trigger a new DAG run at 12AM on February 1st. `CronDataIntervalTimetable`_, on the other
hand, will immediately trigger a new DAG run which is supposed to trigger at 12AM on January 31st if the DAG had been
enabled beforehand.
This is another example showing the difference in the case of skipping DAG runs. Suppose there are two running DAGs
using the two timetables with a cron expression ``@daily`` or ``0 0 * * *``. If you pause the DAGs at 3PM on January
31st and re-enable them at 3PM on February 2nd, `CronTriggerTimetable`_ skips the DAG runs which are supposed to
trigger on February 1st and 2nd. The next DAG run will be triggered at 12AM on February 3rd. `CronDataIntervalTimetable`_,
on the other hand, skips the DAG runs which are supposed to trigger on February 1st only. A DAG run for February 2nd
is immediately triggered after you re-enable the DAG.
By these examples, you see how `CronTriggerTimetable`_ triggers DAG runs is more intuitive and more similar to what
people expect cron to behave than how `CronDataIntervalTimetable`_ does.