blob: 476221dc78a01bebb36b770da4d10aaa90a50fff [file] [log] [blame]
:mod:`airflow.contrib.hooks.azure_data_lake_hook`
=================================================
.. py:module:: airflow.contrib.hooks.azure_data_lake_hook
Module Contents
---------------
.. py:class:: AzureDataLakeHook(azure_data_lake_conn_id='azure_data_lake_default')
Bases: :class:`airflow.hooks.base_hook.BaseHook`
Interacts with Azure Data Lake.
Client ID and client secret should be in user and password parameters.
Tenant and account name should be extra field as
{"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.
:param azure_data_lake_conn_id: Reference to the Azure Data Lake connection.
:type azure_data_lake_conn_id: str
.. method:: get_conn(self)
Return a AzureDLFileSystem object.
.. method:: check_for_file(self, file_path)
Check if a file exists on Azure Data Lake.
:param file_path: Path and name of the file.
:type file_path: str
:return: True if the file exists, False otherwise.
:rtype: bool
.. method:: upload_file(self, local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)
Upload a file to Azure Data Lake.
:param local_path: local path. Can be single file, directory (in which case,
upload recursively) or glob pattern. Recursive glob patterns using `**`
are not supported.
:type local_path: str
:param remote_path: Remote path to upload to; if multiple files, this is the
directory root to write within.
:type remote_path: str
:param nthreads: Number of threads to use. If None, uses the number of cores.
:type nthreads: int
:param overwrite: Whether to forcibly overwrite existing files/directories.
If False and remote path is a directory, will quit regardless if any files
would be overwritten or not. If True, only matching filenames are actually
overwritten.
:type overwrite: bool
:param buffersize: int [2**22]
Number of bytes for internal buffer. This block cannot be bigger than
a chunk and cannot be smaller than a block.
:type buffersize: int
:param blocksize: int [2**22]
Number of bytes for a block. Within each chunk, we write a smaller
block for each API call. This block cannot be bigger than a chunk.
:type blocksize: int
.. method:: download_file(self, local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)
Download a file from Azure Blob Storage.
:param local_path: local path. If downloading a single file, will write to this
specific file, unless it is an existing directory, in which case a file is
created within it. If downloading multiple files, this is the root
directory to write within. Will create directories as required.
:type local_path: str
:param remote_path: remote path/globstring to use to find remote files.
Recursive glob patterns using `**` are not supported.
:type remote_path: str
:param nthreads: Number of threads to use. If None, uses the number of cores.
:type nthreads: int
:param overwrite: Whether to forcibly overwrite existing files/directories.
If False and remote path is a directory, will quit regardless if any files
would be overwritten or not. If True, only matching filenames are actually
overwritten.
:type overwrite: bool
:param buffersize: int [2**22]
Number of bytes for internal buffer. This block cannot be bigger than
a chunk and cannot be smaller than a block.
:type buffersize: int
:param blocksize: int [2**22]
Number of bytes for a block. Within each chunk, we write a smaller
block for each API call. This block cannot be bigger than a chunk.
:type blocksize: int
.. method:: list(self, path)
List files in Azure Data Lake Storage
:param path: full path/globstring to use to list files in ADLS
:type path: str