blob: 5380aac5fa5942ccb449612043e3770f7b40064e [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
============
AWS DataSync
============
`AWS DataSync <https://aws.amazon.com/datasync/>`__ is a data-transfer service that simplifies, automates,
and accelerates moving and replicating data between on-premises storage systems and AWS storage services over
the internet or AWS Direct Connect.
Prerequisite Tasks
------------------
.. include:: _partials/prerequisite_tasks.rst
Operators
---------
.. _howto/operator:DataSyncOperator:
Interact with AWS DataSync Tasks
================================
You can use :class:`~airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` to
find, create, update, execute and delete AWS DataSync tasks.
Once the :class:`~airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` has identified
the correct TaskArn to run (either because you specified it, or because it was found), it will then be
executed. Whenever an AWS DataSync Task is executed it creates an AWS DataSync TaskExecution, identified
by a TaskExecutionArn.
The TaskExecutionArn will be monitored until completion (success / failure), and its status will be
periodically written to the Airflow task log.
The :class:`~airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` supports
optional passing of additional kwargs to the underlying ``boto3.start_task_execution()`` API.
This is done with the ``task_execution_kwargs`` parameter.
This is useful for example to limit bandwidth or filter included files, see the `boto3 Datasync
documentation <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/datasync.html>`__
for more details.
Execute a task
""""""""""""""
To execute a specific task, you can pass the ``task_arn`` to the operator.
.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_datasync.py
:language: python
:dedent: 4
:start-after: [START howto_operator_datasync_specific_task]
:end-before: [END howto_operator_datasync_specific_task]
Search and execute a task
"""""""""""""""""""""""""
To search for a task, you can specify the ``source_location_uri`` and ``destination_location_uri`` to the operator.
If one task is found, this one will be executed.
If more than one task is found, the operator will raise an Exception. To avoid this, you can set
``allow_random_task_choice`` to ``True`` to randomly choose from candidate tasks.
.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_datasync.py
:language: python
:dedent: 4
:start-after: [START howto_operator_datasync_search_task]
:end-before: [END howto_operator_datasync_search_task]
Create and execute a task
"""""""""""""""""""""""""
When searching for a task, if no task is found you have the option to create one before executing it.
In order to do that, you need to provide the extra parameters ``create_task_kwargs``, ``create_source_location_kwargs``
and ``create_destination_location_kwargs``.
These extra parameters provide a way for the operator to automatically create a Task and/or Locations if no suitable
existing Task was found. If these are left to their default value (None) then no create will be attempted.
Also, because ``delete_task_after_execution`` is set to ``True``, the task will be deleted
from AWS DataSync after it completes successfully.
.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_datasync.py
:language: python
:dedent: 4
:start-after: [START howto_operator_datasync_create_task]
:end-before: [END howto_operator_datasync_create_task]
When creating a Task, the
:class:`~airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` will try to find
and use existing LocationArns rather than creating new ones. If multiple LocationArns match the
specified URIs then we need to choose one to use. In this scenario, the operator behaves similarly
to how it chooses a single Task from many Tasks:
The operator will raise an Exception. To avoid this, you can set ``allow_random_location_choice`` to ``True``
to randomly choose from candidate Locations.
Reference
---------
* `AWS boto3 library documentation for DataSync <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/datasync.html>`__