blob: c2c3db9e4cd281c37ec643c352d4a5e57acb2736 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Metrics
=======
Airflow can be set up to send metrics to `StatsD <https://github.com/etsy/statsd>`__.
Setup
-----
First you must install statsd requirement:
.. code-block:: bash
pip install 'apache-airflow[statsd]'
Add the following lines to your configuration file e.g. ``airflow.cfg``
.. code-block:: ini
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
If you want to avoid sending all the available metrics to StatsD, you can configure an allow list of prefixes to send only
the metrics that start with the elements of the list:
.. code-block:: ini
[metrics]
statsd_allow_list = scheduler,executor,dagrun
If you want to redirect metrics to different name, you can configure ``stat_name_handler`` option
in ``[scheduler]`` section. It should point to a function that validate the statsd stat name, apply changes
to the stat name if necessary and return the transformed stat name. The function may looks as follow:
.. code-block:: python
def my_custom_stat_name_handler(stat_name: str) -> str:
return stat_name.lower()[:32]
If you want to use a custom Statsd client outwith the default one provided by Airflow the following key must be added
to the configuration file alongside the module path of your custom Statsd client. This module must be available on
your :envvar:`PYTHONPATH`.
.. code-block:: ini
[metrics]
statsd_custom_client_path = x.y.customclient
See :doc:`../modules_management` for details on how Python and Airflow manage modules.
.. note::
For a detailed listing of configuration options regarding metrics,
see the configuration reference documentation - :ref:`config:metrics`.
Counters
--------
======================================= ================================================================
Name Description
======================================= ================================================================
``<job_name>_start`` Number of started ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_end`` Number of ended ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures
``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes
``ti_failures`` Overall task instances failures
``ti_successes`` Overall task instances successes
``zombies_killed`` Zombie tasks killed
``scheduler_heartbeat`` Scheduler heartbeats
``dag_processing.processes`` Number of currently running DAG parsing processes
``scheduler.tasks.killed_externally`` Number of tasks killed externally
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler
``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler
``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical
section (needed to send tasks to the executor) and found it locked by
another process.
``sla_email_notification_failure`` Number of failed SLA miss email notification attempts
``ti.start.<dagid>.<taskid>`` Number of started task in a given dag. Similar to <job_name>_start but for task
``ti.finish.<dagid>.<taskid>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task
``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working.
``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
======================================= ================================================================
Gauges
------
=================================================== ========================================================================
Name Description
=================================================== ========================================================================
``dagbag_size`` DAG bag size
``dag_processing.import_errors`` Number of errors from trying to parse DAG files
``dag_processing.total_parse_time`` Seconds taken to scan and import all DAG files once
``dag_processing.last_runtime.<dag_file>`` Seconds spent processing ``<dag_file>`` (in most recent iteration)
``dag_processing.last_run.seconds_ago.<dag_file>`` Seconds since ``<dag_file>`` was last processed
``dag_processing.processor_timeouts`` Number of file processors that have been killed due to taking too long
``executor.open_slots`` Number of open slots on executor
``executor.queued_tasks`` Number of queued tasks on executor
``executor.running_tasks`` Number of running tasks on executor
``pool.open_slots.<pool_name>`` Number of open slots in the pool
``pool.queued_slots.<pool_name>`` Number of queued slots in the pool
``pool.running_slots.<pool_name>`` Number of running slots in the pool
``pool.starving_tasks.<pool_name>`` Number of starving tasks in the pool
``smart_sensor_operator.poked_tasks`` Number of tasks poked by the smart sensor in the previous poking loop
``smart_sensor_operator.poked_success`` Number of newly succeeded tasks poked by the smart sensor in the previous poking loop
``smart_sensor_operator.poked_exception`` Number of exceptions in the previous smart sensor poking loop
``smart_sensor_operator.exception_failures`` Number of failures caused by exception in the previous smart sensor poking loop
``smart_sensor_operator.infra_failures`` Number of infrastructure failures in the previous smart sensor poking loop
=================================================== ========================================================================
Timers
------
=================================================== ========================================================================
Name Description
=================================================== ========================================================================
``dagrun.dependency-check.<dag_id>`` Milliseconds taken to check DAG dependencies
``dag.<dag_id>.<task_id>.duration`` Milliseconds taken to finish a task
``dag_processing.last_duration.<dag_file>`` Milliseconds taken to load the given DAG file
``dagrun.duration.success.<dag_id>`` Milliseconds taken for a DagRun to reach success state
``dagrun.duration.failed.<dag_id>`` Milliseconds taken for a DagRun to reach failed state
``dagrun.schedule_delay.<dag_id>`` Milliseconds of delay between the scheduled DagRun
start date and the actual DagRun start date
``scheduler.critical_section_duration`` Milliseconds spent in the critical section of scheduler loop --
only a single scheduler can enter this loop at a time
``dagrun.<dag_id>.first_task_scheduling_delay`` Milliseconds elapsed between first task start_date and dagrun expected start
=================================================== ========================================================================