blob: 7a89ca6d295c9043a14381b88c403e1186a16c8d [file] [log] [blame]
:mod:`airflow.contrib.operators.gcp_transfer_operator`
======================================================
.. py:module:: airflow.contrib.operators.gcp_transfer_operator
Module Contents
---------------
.. data:: AwsHook
.. py:class:: TransferJobPreprocessor(body, aws_conn_id='aws_default', default_schedule=False)
.. method:: _inject_aws_credentials(self)
.. method:: _reformat_date(self, field_key)
.. method:: _reformat_time(self, field_key)
.. method:: _reformat_schedule(self)
.. method:: process_body(self)
.. staticmethod:: _convert_date_to_dict(field_date)
Convert native python ``datetime.date`` object to a format supported by the API
.. staticmethod:: _convert_time_to_dict(time)
Convert native python ``datetime.time`` object to a format supported by the API
.. py:class:: TransferJobValidator(body)
.. method:: _verify_data_source(self)
.. method:: _restrict_aws_credentials(self)
.. method:: _restrict_empty_body(self)
.. method:: validate_body(self)
.. py:class:: GcpTransferServiceJobCreateOperator(body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Creates a transfer job that runs periodically.
.. warning::
This operator is NOT idempotent. If you run it many times, many transfer
jobs will be created in the Google Cloud Platform.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceJobCreateOperator`
:param body: (Required) The request body, as described in
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/create#request-body
With three additional improvements:
* dates can be given in the form :class:`datetime.date`
* times can be given in the form :class:`datetime.time`
* credentials to Amazon Web Service should be stored in the connection and indicated by the
aws_conn_id parameter
:type body: dict
:param aws_conn_id: The connection ID used to retrieve credentials to
Amazon Web Service.
:type aws_conn_id: str
:param gcp_conn_id: The connection ID used to connect to Google Cloud
Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['body', 'gcp_conn_id', 'aws_conn_id']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceJobUpdateOperator(job_name, body, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Updates a transfer job that runs periodically.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceJobUpdateOperator`
:param job_name: (Required) Name of the job to be updated
:type job_name: str
:param body: (Required) The request body, as described in
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/patch#request-body
With three additional improvements:
* dates can be given in the form :class:`datetime.date`
* times can be given in the form :class:`datetime.time`
* credentials to Amazon Web Service should be stored in the connection and indicated by the
aws_conn_id parameter
:type body: dict
:param aws_conn_id: The connection ID used to retrieve credentials to
Amazon Web Service.
:type aws_conn_id: str
:param gcp_conn_id: The connection ID used to connect to Google Cloud
Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['job_name', 'body', 'gcp_conn_id', 'aws_conn_id']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceJobDeleteOperator(job_name, gcp_conn_id='google_cloud_default', api_version='v1', project_id=None, *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Delete a transfer job. This is a soft delete. After a transfer job is
deleted, the job and all the transfer executions are subject to garbage
collection. Transfer jobs become eligible for garbage collection
30 days after soft delete.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceJobDeleteOperator`
:param job_name: (Required) Name of the TRANSFER operation
:type job_name: str
:param project_id: (Optional) the ID of the project that owns the Transfer
Job. If set to None or missing, the default project_id from the GCP
connection is used.
:type project_id: str
:param gcp_conn_id: The connection ID used to connect to Google Cloud
Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['job_name', 'project_id', 'gcp_conn_id', 'api_version']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceOperationGetOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Gets the latest state of a long-running operation in Google Storage Transfer
Service.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceOperationGetOperator`
:param operation_name: (Required) Name of the transfer operation.
:type operation_name: str
:param gcp_conn_id: The connection ID used to connect to Google
Cloud Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['operation_name', 'gcp_conn_id']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceOperationsListOperator(filter, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Lists long-running operations in Google Storage Transfer
Service that match the specified filter.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceOperationsListOperator`
:param filter: (Required) A request filter, as described in
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs/list#body.QUERY_PARAMETERS.filter
:type filter: dict
:param gcp_conn_id: The connection ID used to connect to Google
Cloud Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['filter', 'gcp_conn_id']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceOperationPauseOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Pauses a transfer operation in Google Storage Transfer Service.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceOperationPauseOperator`
:param operation_name: (Required) Name of the transfer operation.
:type operation_name: str
:param gcp_conn_id: The connection ID used to connect to Google Cloud Platform.
:type gcp_conn_id: str
:param api_version: API version used (e.g. v1).
:type api_version: str
.. attribute:: template_fields
:annotation: = ['operation_name', 'gcp_conn_id', 'api_version']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceOperationResumeOperator(operation_name, gcp_conn_id='google_cloud_default', api_version='v1', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Resumes a transfer operation in Google Storage Transfer Service.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceOperationResumeOperator`
:param operation_name: (Required) Name of the transfer operation.
:type operation_name: str
:param gcp_conn_id: The connection ID used to connect to Google Cloud Platform.
:param api_version: API version used (e.g. v1).
:type api_version: str
:type gcp_conn_id: str
.. attribute:: template_fields
:annotation: = ['operation_name', 'gcp_conn_id', 'api_version']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: GcpTransferServiceOperationCancelOperator(operation_name, api_version='v1', gcp_conn_id='google_cloud_default', *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Cancels a transfer operation in Google Storage Transfer Service.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:GcpTransferServiceOperationCancelOperator`
:param operation_name: (Required) Name of the transfer operation.
:type operation_name: str
:param api_version: API version used (e.g. v1).
:type api_version: str
:param gcp_conn_id: The connection ID used to connect to Google
Cloud Platform.
:type gcp_conn_id: str
.. attribute:: template_fields
:annotation: = ['operation_name', 'gcp_conn_id', 'api_version']
.. method:: _validate_inputs(self)
.. method:: execute(self, context)
.. py:class:: S3ToGoogleCloudStorageTransferOperator(s3_bucket, gcs_bucket, project_id=None, aws_conn_id='aws_default', gcp_conn_id='google_cloud_default', delegate_to=None, description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Synchronizes an S3 bucket with a Google Cloud Storage bucket using the
GCP Storage Transfer Service.
.. warning::
This operator is NOT idempotent. If you run it many times, many transfer
jobs will be created in the Google Cloud Platform.
**Example**:
.. code-block:: python
s3_to_gcs_transfer_op = S3ToGoogleCloudStorageTransferOperator(
task_id='s3_to_gcs_transfer_example',
s3_bucket='my-s3-bucket',
project_id='my-gcp-project',
gcs_bucket='my-gcs-bucket',
dag=my_dag)
:param s3_bucket: The S3 bucket where to find the objects. (templated)
:type s3_bucket: str
:param gcs_bucket: The destination Google Cloud Storage bucket
where you want to store the files. (templated)
:type gcs_bucket: str
:param project_id: Optional ID of the Google Cloud Platform Console project that
owns the job
:type project_id: str
:param aws_conn_id: The source S3 connection
:type aws_conn_id: str
:param gcp_conn_id: The destination connection ID to use
when connecting to Google Cloud Storage.
:type gcp_conn_id: str
:param delegate_to: The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.
:type delegate_to: str
:param description: Optional transfer service job description
:type description: str
:param schedule: Optional transfer service schedule;
If not set, run transfer job once as soon as the operator runs
The format is described
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs.
With two additional improvements:
* dates they can be passed as :class:`datetime.date`
* times they can be passed as :class:`datetime.time`
:type schedule: dict
:param object_conditions: Optional transfer service object conditions; see
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
:type object_conditions: dict
:param transfer_options: Optional transfer service transfer options; see
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
:type transfer_options: dict
:param wait: Wait for transfer to finish
:type wait: bool
:param timeout: Time to wait for the operation to end in seconds
:type timeout: int
.. attribute:: template_fields
:annotation: = ['gcp_conn_id', 's3_bucket', 'gcs_bucket', 'description', 'object_conditions']
.. attribute:: ui_color
:annotation: = #e09411
.. method:: execute(self, context)
.. method:: _create_body(self)
.. py:class:: GoogleCloudStorageToGoogleCloudStorageTransferOperator(source_bucket, destination_bucket, project_id=None, gcp_conn_id='google_cloud_default', delegate_to=None, description=None, schedule=None, object_conditions=None, transfer_options=None, wait=True, timeout=None, *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Copies objects from a bucket to another using the GCP Storage Transfer
Service.
.. warning::
This operator is NOT idempotent. If you run it many times, many transfer
jobs will be created in the Google Cloud Platform.
**Example**:
.. code-block:: python
gcs_to_gcs_transfer_op = GoogleCloudStorageToGoogleCloudStorageTransferOperator(
task_id='gcs_to_gcs_transfer_example',
source_bucket='my-source-bucket',
destination_bucket='my-destination-bucket',
project_id='my-gcp-project',
dag=my_dag)
:param source_bucket: The source Google cloud storage bucket where the
object is. (templated)
:type source_bucket: str
:param destination_bucket: The destination Google cloud storage bucket
where the object should be. (templated)
:type destination_bucket: str
:param project_id: The ID of the Google Cloud Platform Console project that
owns the job
:type project_id: str
:param gcp_conn_id: Optional connection ID to use when connecting to Google Cloud
Storage.
:type gcp_conn_id: str
:param delegate_to: The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.
:type delegate_to: str
:param description: Optional transfer service job description
:type description: str
:param schedule: Optional transfer service schedule;
If not set, run transfer job once as soon as the operator runs
See:
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs.
With two additional improvements:
* dates they can be passed as :class:`datetime.date`
* times they can be passed as :class:`datetime.time`
:type schedule: dict
:param object_conditions: Optional transfer service object conditions; see
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#ObjectConditions
:type object_conditions: dict
:param transfer_options: Optional transfer service transfer options; see
https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec#TransferOptions
:type transfer_options: dict
:param wait: Wait for transfer to finish; defaults to `True`
:type wait: bool
:param timeout: Time to wait for the operation to end in seconds
:type timeout: int
.. attribute:: template_fields
:annotation: = ['gcp_conn_id', 'source_bucket', 'destination_bucket', 'description', 'object_conditions']
.. attribute:: ui_color
:annotation: = #e09411
.. method:: execute(self, context)
.. method:: _create_body(self)