blob: ccd9c25c9d8ba9aa2387feb092483155af685487 [file] [log] [blame]
:mod:`airflow.contrib.operators.adls_to_gcs`
============================================
.. py:module:: airflow.contrib.operators.adls_to_gcs
Module Contents
---------------
.. py:class:: AdlsToGoogleCloudStorageOperator(src_adls, dest_gcs, azure_data_lake_conn_id, google_cloud_storage_conn_id, delegate_to=None, replace=False, gzip=False, *args, **kwargs)
Bases: :class:`airflow.contrib.operators.adls_list_operator.AzureDataLakeStorageListOperator`
Synchronizes an Azure Data Lake Storage path with a GCS bucket
:param src_adls: The Azure Data Lake path to find the objects (templated)
:type src_adls: str
:param dest_gcs: The Google Cloud Storage bucket and prefix to
store the objects. (templated)
:type dest_gcs: str
:param replace: If true, replaces same-named files in GCS
:type replace: bool
:param gzip: Option to compress file for upload
:type gzip: bool
:param azure_data_lake_conn_id: The connection ID to use when
connecting to Azure Data Lake Storage.
:type azure_data_lake_conn_id: str
:param google_cloud_storage_conn_id: The connection ID to use when
connecting to Google Cloud Storage.
:type google_cloud_storage_conn_id: str
:param delegate_to: The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.
:type delegate_to: str
**Examples**:
The following Operator would copy a single file named
``hello/world.avro`` from ADLS to the GCS bucket ``mybucket``. Its full
resulting gcs path will be ``gs://mybucket/hello/world.avro`` ::
copy_single_file = AdlsToGoogleCloudStorageOperator(
task_id='copy_single_file',
src_adls='hello/world.avro',
dest_gcs='gs://mybucket',
replace=False,
azure_data_lake_conn_id='azure_data_lake_default',
google_cloud_storage_conn_id='google_cloud_default'
)
The following Operator would copy all parquet files from ADLS
to the GCS bucket ``mybucket``. ::
copy_all_files = AdlsToGoogleCloudStorageOperator(
task_id='copy_all_files',
src_adls='*.parquet',
dest_gcs='gs://mybucket',
replace=False,
azure_data_lake_conn_id='azure_data_lake_default',
google_cloud_storage_conn_id='google_cloud_default'
)
The following Operator would copy all parquet files from ADLS
path ``/hello/world``to the GCS bucket ``mybucket``. ::
copy_world_files = AdlsToGoogleCloudStorageOperator(
task_id='copy_world_files',
src_adls='hello/world/*.parquet',
dest_gcs='gs://mybucket',
replace=False,
azure_data_lake_conn_id='azure_data_lake_default',
google_cloud_storage_conn_id='google_cloud_default'
)
.. attribute:: template_fields
:annotation: = ['src_adls', 'dest_gcs']
.. attribute:: ui_color
:annotation: = #f0eee4
.. method:: execute(self, context)