| :py:mod:`airflow.providers.docker.operators.docker` |
| =================================================== |
| |
| .. py:module:: airflow.providers.docker.operators.docker |
| |
| .. autoapi-nested-parse:: |
| |
| Implements Docker operator |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.docker.operators.docker.DockerOperator |
| |
| |
| |
| Functions |
| ~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.docker.operators.docker.stringify |
| |
| |
| |
| .. py:function:: stringify(line) |
| |
| Make sure string is returned even if bytes are passed. Docker stream can return bytes. |
| |
| |
| .. py:class:: DockerOperator(*, image, api_version = None, command = None, container_name = None, cpus = 1.0, docker_url = 'unix://var/run/docker.sock', environment = None, private_environment = None, force_pull = False, mem_limit = None, host_tmp_dir = None, network_mode = None, tls_ca_cert = None, tls_client_cert = None, tls_client_key = None, tls_hostname = None, tls_ssl_version = None, mount_tmp_dir = True, tmp_dir = '/tmp/airflow', user = None, mounts = None, entrypoint = None, working_dir = None, xcom_all = False, docker_conn_id = None, dns = None, dns_search = None, auto_remove = False, shm_size = None, tty = False, privileged = False, cap_add = None, extra_hosts = None, retrieve_output = False, retrieve_output_path = None, timeout = DEFAULT_TIMEOUT_SECONDS, device_requests = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| Execute a command inside a docker container. |
| |
| By default, a temporary directory is |
| created on the host and mounted into a container to allow storing files |
| that together exceed the default disk size of 10GB in a container. |
| In this case The path to the mounted directory can be accessed |
| via the environment variable ``AIRFLOW_TMP_DIR``. |
| |
| If the volume cannot be mounted, warning is printed and an attempt is made to execute the docker |
| command without the temporary folder mounted. This is to make it works by default with remote docker |
| engine or when you run docker-in-docker solution and temporary directory is not shared with the |
| docker engine. Warning is printed in logs in this case. |
| |
| If you know you run DockerOperator with remote engine or via docker-in-docker |
| you should set ``mount_tmp_dir`` parameter to False. In this case, you can still use |
| ``mounts`` parameter to mount already existing named volumes in your Docker Engine |
| to achieve similar capability where you can store files exceeding default disk size |
| of the container, |
| |
| If a login to a private registry is required prior to pulling the image, a |
| Docker connection needs to be configured in Airflow and the connection ID |
| be provided with the parameter ``docker_conn_id``. |
| |
| :param image: Docker image from which to create the container. |
| If image tag is omitted, "latest" will be used. (templated) |
| :param api_version: Remote API version. Set to ``auto`` to automatically |
| detect the server's version. |
| :param command: Command to be run in the container. (templated) |
| :param container_name: Name of the container. Optional (templated) |
| :param cpus: Number of CPUs to assign to the container. |
| This value gets multiplied with 1024. See |
| https://docs.docker.com/engine/reference/run/#cpu-share-constraint |
| :param docker_url: URL of the host running the docker daemon. |
| Default is unix://var/run/docker.sock |
| :param environment: Environment variables to set in the container. (templated) |
| :param private_environment: Private environment variables to set in the container. |
| These are not templated, and hidden from the website. |
| :param force_pull: Pull the docker image on every run. Default is False. |
| :param mem_limit: Maximum amount of memory the container can use. |
| Either a float value, which represents the limit in bytes, |
| or a string like ``128m`` or ``1g``. |
| :param host_tmp_dir: Specify the location of the temporary directory on the host which will |
| be mapped to tmp_dir. If not provided defaults to using the standard system temp directory. |
| :param network_mode: Network mode for the container. |
| It can be one of the following: |
| bridge - Create new network stack for the container with default docker bridge network |
| None - No networking for this container |
| container:<name|id> - Use the network stack of another container specified via <name|id> |
| host - Use the host network stack. Incompatible with `port_bindings` |
| '<network-name>|<network-id>' - Connects the container to user created network |
| (using `docker network create` command) |
| :param tls_ca_cert: Path to a PEM-encoded certificate authority |
| to secure the docker connection. |
| :param tls_client_cert: Path to the PEM-encoded certificate |
| used to authenticate docker client. |
| :param tls_client_key: Path to the PEM-encoded key used to authenticate docker client. |
| :param tls_hostname: Hostname to match against |
| the docker server certificate or False to disable the check. |
| :param tls_ssl_version: Version of SSL to use when communicating with docker daemon. |
| :param mount_tmp_dir: Specify whether the temporary directory should be bind-mounted |
| from the host to the container. Defaults to True |
| :param tmp_dir: Mount point inside the container to |
| a temporary directory created on the host by the operator. |
| The path is also made available via the environment variable |
| ``AIRFLOW_TMP_DIR`` inside the container. |
| :param user: Default user inside the docker container. |
| :param mounts: List of volumes to mount into the container. Each item should |
| be a :py:class:`docker.types.Mount` instance. |
| :param entrypoint: Overwrite the default ENTRYPOINT of the image |
| :param working_dir: Working directory to |
| set on the container (equivalent to the -w switch the docker client) |
| :param xcom_all: Push all the stdout or just the last line. |
| The default is False (last line). |
| :param docker_conn_id: The :ref:`Docker connection id <howto/connection:docker>` |
| :param dns: Docker custom DNS servers |
| :param dns_search: Docker custom DNS search domain |
| :param auto_remove: Auto-removal of the container on daemon side when the |
| container's process exits. |
| The default is False. |
| :param shm_size: Size of ``/dev/shm`` in bytes. The size must be |
| greater than 0. If omitted uses system default. |
| :param tty: Allocate pseudo-TTY to the container |
| This needs to be set see logs of the Docker container. |
| :param privileged: Give extended privileges to this container. |
| :param cap_add: Include container capabilities |
| :param retrieve_output: Should this docker image consistently attempt to pull from and output |
| file before manually shutting down the image. Useful for cases where users want a pickle serialized |
| output that is not posted to logs |
| :param retrieve_output_path: path for output file that will be retrieved and passed to xcom |
| :param device_requests: Expose host resources such as GPUs to the container. |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['image', 'command', 'environment', 'container_name'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.sh', '.bash'] |
| |
| |
| |
| .. py:method:: get_hook(self) |
| |
| Retrieves hook for the operator. |
| |
| :return: The Docker Hook |
| |
| |
| .. py:method:: execute(self, context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| .. py:method:: format_command(command) |
| :staticmethod: |
| |
| Retrieve command(s). if command string starts with [, it returns the command list) |
| |
| :param command: Docker command or entrypoint |
| |
| :return: the command (or commands) |
| :rtype: str | List[str] |
| |
| |
| .. py:method:: on_kill(self) |
| |
| Override this method to cleanup subprocesses when a task instance |
| gets killed. Any use of the threading, subprocess or multiprocessing |
| module within an operator needs to be cleaned up or it will leave |
| ghost processes behind. |
| |
| |
| |