docs-archive/apache-airflow-providers-microsoft-azure/4.0.0/_sources/_api/airflow/providers/microsoft/azure/hooks/data_lake/index.rst.txt - airflow-site - Git at Google

 :py:mod:`airflow.providers.microsoft.azure.hooks.data_lake`
 ===========================================================

 .. py:module:: airflow.providers.microsoft.azure.hooks.data_lake

 .. autoapi-nested-parse::

    This module contains integration with Azure Data Lake.

    AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a
    Airflow connection of type `azure_data_lake` exists. Authorization can be done by supplying a
    login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)
    (see connection `azure_data_lake_default` for an example).


 Module Contents
 ---------------

 Classes
 ~~~~~~~

 .. autoapisummary::

    airflow.providers.microsoft.azure.hooks.data_lake.AzureDataLakeHook


 .. py:class:: AzureDataLakeHook(azure_data_lake_conn_id = default_conn_name)

    Bases: :py:obj:`airflow.hooks.base.BaseHook`

    Interacts with Azure Data Lake.

    Client ID and client secret should be in user and password parameters.
    Tenant and account name should be extra field as
    {"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.

    :param azure_data_lake_conn_id: Reference to the :ref:`Azure Data Lake connection<howto/connection:adl>`.

    .. py:attribute:: conn_name_attr
       :annotation: = azure_data_lake_conn_id


    .. py:attribute:: default_conn_name
       :annotation: = azure_data_lake_default


    .. py:attribute:: conn_type
       :annotation: = azure_data_lake


    .. py:attribute:: hook_name
       :annotation: = Azure Data Lake


    .. py:method:: get_connection_form_widgets()
       :staticmethod:

       Returns connection widgets to add to connection form


    .. py:method:: get_ui_field_behaviour()
       :staticmethod:

       Returns custom field behaviour


    .. py:method:: get_conn(self)

       Return a AzureDLFileSystem object.


    .. py:method:: check_for_file(self, file_path)

       Check if a file exists on Azure Data Lake.

       :param file_path: Path and name of the file.
       :return: True if the file exists, False otherwise.
       :rtype: bool


    .. py:method:: upload_file(self, local_path, remote_path, nthreads = 64, overwrite = True, buffersize = 4194304, blocksize = 4194304, **kwargs)

       Upload a file to Azure Data Lake.

       :param local_path: local path. Can be single file, directory (in which case,
           upload recursively) or glob pattern. Recursive glob patterns using `**`
           are not supported.
       :param remote_path: Remote path to upload to; if multiple files, this is the
           directory root to write within.
       :param nthreads: Number of threads to use. If None, uses the number of cores.
       :param overwrite: Whether to forcibly overwrite existing files/directories.
           If False and remote path is a directory, will quit regardless if any files
           would be overwritten or not. If True, only matching filenames are actually
           overwritten.
       :param buffersize: int [2**22]
           Number of bytes for internal buffer. This block cannot be bigger than
           a chunk and cannot be smaller than a block.
       :param blocksize: int [2**22]
           Number of bytes for a block. Within each chunk, we write a smaller
           block for each API call. This block cannot be bigger than a chunk.


    .. py:method:: download_file(self, local_path, remote_path, nthreads = 64, overwrite = True, buffersize = 4194304, blocksize = 4194304, **kwargs)

       Download a file from Azure Blob Storage.

       :param local_path: local path. If downloading a single file, will write to this
           specific file, unless it is an existing directory, in which case a file is
           created within it. If downloading multiple files, this is the root
           directory to write within. Will create directories as required.
       :param remote_path: remote path/globstring to use to find remote files.
           Recursive glob patterns using `**` are not supported.
       :param nthreads: Number of threads to use. If None, uses the number of cores.
       :param overwrite: Whether to forcibly overwrite existing files/directories.
           If False and remote path is a directory, will quit regardless if any files
           would be overwritten or not. If True, only matching filenames are actually
           overwritten.
       :param buffersize: int [2**22]
           Number of bytes for internal buffer. This block cannot be bigger than
           a chunk and cannot be smaller than a block.
       :param blocksize: int [2**22]
           Number of bytes for a block. Within each chunk, we write a smaller
           block for each API call. This block cannot be bigger than a chunk.


    .. py:method:: list(self, path)

       List files in Azure Data Lake Storage

       :param path: full path/globstring to use to list files in ADLS


    .. py:method:: remove(self, path, recursive = False, ignore_not_found = True)

       Remove files in Azure Data Lake Storage

       :param path: A directory or file to remove in ADLS
       :param recursive: Whether to loop into directories in the location and remove the files
       :param ignore_not_found: Whether to raise error if file to delete is not found
	:py:mod:`airflow.providers.microsoft.azure.hooks.data_lake`
	===========================================================

	.. py:module:: airflow.providers.microsoft.azure.hooks.data_lake

	.. autoapi-nested-parse::

	This module contains integration with Azure Data Lake.

	AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a
	Airflow connection of type `azure_data_lake` exists. Authorization can be done by supplying a
	login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)
	(see connection `azure_data_lake_default` for an example).



	Module Contents
	---------------

	Classes
	~~~~~~~

	.. autoapisummary::

	airflow.providers.microsoft.azure.hooks.data_lake.AzureDataLakeHook




	.. py:class:: AzureDataLakeHook(azure_data_lake_conn_id = default_conn_name)

	Bases: :py:obj:`airflow.hooks.base.BaseHook`

	Interacts with Azure Data Lake.

	Client ID and client secret should be in user and password parameters.
	Tenant and account name should be extra field as
	{"tenant": "<TENANT>", "account_name": "ACCOUNT_NAME"}.

	:param azure_data_lake_conn_id: Reference to the :ref:`Azure Data Lake connection<howto/connection:adl>`.

	.. py:attribute:: conn_name_attr
	:annotation: = azure_data_lake_conn_id



	.. py:attribute:: default_conn_name
	:annotation: = azure_data_lake_default



	.. py:attribute:: conn_type
	:annotation: = azure_data_lake



	.. py:attribute:: hook_name
	:annotation: = Azure Data Lake



	.. py:method:: get_connection_form_widgets()
	:staticmethod:

	Returns connection widgets to add to connection form


	.. py:method:: get_ui_field_behaviour()
	:staticmethod:

	Returns custom field behaviour


	.. py:method:: get_conn(self)

	Return a AzureDLFileSystem object.


	.. py:method:: check_for_file(self, file_path)

	Check if a file exists on Azure Data Lake.

	:param file_path: Path and name of the file.
	:return: True if the file exists, False otherwise.
	:rtype: bool


	.. py:method:: upload_file(self, local_path, remote_path, nthreads = 64, overwrite = True, buffersize = 4194304, blocksize = 4194304, **kwargs)

	Upload a file to Azure Data Lake.

	:param local_path: local path. Can be single file, directory (in which case,
	upload recursively) or glob pattern. Recursive glob patterns using `**`
	are not supported.
	:param remote_path: Remote path to upload to; if multiple files, this is the
	directory root to write within.
	:param nthreads: Number of threads to use. If None, uses the number of cores.
	:param overwrite: Whether to forcibly overwrite existing files/directories.
	If False and remote path is a directory, will quit regardless if any files
	would be overwritten or not. If True, only matching filenames are actually
	overwritten.
	:param buffersize: int [2**22]
	Number of bytes for internal buffer. This block cannot be bigger than
	a chunk and cannot be smaller than a block.
	:param blocksize: int [2**22]
	Number of bytes for a block. Within each chunk, we write a smaller
	block for each API call. This block cannot be bigger than a chunk.


	.. py:method:: download_file(self, local_path, remote_path, nthreads = 64, overwrite = True, buffersize = 4194304, blocksize = 4194304, **kwargs)

	Download a file from Azure Blob Storage.

	:param local_path: local path. If downloading a single file, will write to this
	specific file, unless it is an existing directory, in which case a file is
	created within it. If downloading multiple files, this is the root
	directory to write within. Will create directories as required.
	:param remote_path: remote path/globstring to use to find remote files.
	Recursive glob patterns using `**` are not supported.
	:param nthreads: Number of threads to use. If None, uses the number of cores.
	:param overwrite: Whether to forcibly overwrite existing files/directories.
	If False and remote path is a directory, will quit regardless if any files
	would be overwritten or not. If True, only matching filenames are actually
	overwritten.
	:param buffersize: int [2**22]
	Number of bytes for internal buffer. This block cannot be bigger than
	a chunk and cannot be smaller than a block.
	:param blocksize: int [2**22]
	Number of bytes for a block. Within each chunk, we write a smaller
	block for each API call. This block cannot be bigger than a chunk.


	.. py:method:: list(self, path)

	List files in Azure Data Lake Storage

	:param path: full path/globstring to use to list files in ADLS


	.. py:method:: remove(self, path, recursive = False, ignore_not_found = True)

	Remove files in Azure Data Lake Storage

	:param path: A directory or file to remove in ADLS
	:param recursive: Whether to loop into directories in the location and remove the files
	:param ignore_not_found: Whether to raise error if file to delete is not found