blob: e962ffc38a8391549241becf94389994d5edece1 [file] [log] [blame]
:mod:`airflow.providers.microsoft.azure.hooks.wasb`
===================================================
.. py:module:: airflow.providers.microsoft.azure.hooks.wasb
.. autoapi-nested-parse::
This module contains integration with Azure Blob Storage.
It communicate via the Window Azure Storage Blob protocol. Make sure that a
Airflow connection of type `wasb` exists. Authorization can be done by supplying a
login (=Storage account name) and password (=KEY), or login and SAS token in the extra
field (see connection `wasb_default` for an example).
Module Contents
---------------
.. py:class:: WasbHook(wasb_conn_id: str = default_conn_name, public_read: bool = False)
Bases: :class:`airflow.hooks.base.BaseHook`
Interacts with Azure Blob Storage through the ``wasb://`` protocol.
These parameters have to be passed in Airflow Data Base: account_name and account_key.
Additional options passed in the 'extra' field of the connection will be
passed to the `BlockBlockService()` constructor. For example, authenticate
using a SAS token by adding {"sas_token": "YOUR_TOKEN"}.
:param wasb_conn_id: Reference to the wasb connection.
:type wasb_conn_id: str
:param public_read: Whether an anonymous public read access should be used. default is False
:type public_read: bool
.. attribute:: conn_name_attr
:annotation: = wasb_conn_id
.. attribute:: default_conn_name
:annotation: = wasb_default
.. attribute:: conn_type
:annotation: = wasb
.. attribute:: hook_name
:annotation: = Azure Blob Storage
.. method:: get_conn(self)
Return the BlobServiceClient object.
.. method:: _get_container_client(self, container_name: str)
Instantiates a container client
:param container_name: The name of the container
:type container_name: str
:return: ContainerClient
.. method:: _get_blob_client(self, container_name: str, blob_name: str)
Instantiates a blob client
:param container_name: The name of the blob container
:type container_name: str
:param blob_name: The name of the blob. This needs not be existing
:type blob_name: str
.. method:: check_for_blob(self, container_name: str, blob_name: str, **kwargs)
Check if a blob exists on Azure Blob Storage.
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param kwargs: Optional keyword arguments for ``BlobClient.get_blob_properties`` takes.
:type kwargs: object
:return: True if the blob exists, False otherwise.
:rtype: bool
.. method:: check_for_prefix(self, container_name: str, prefix: str, **kwargs)
Check if a prefix exists on Azure Blob storage.
:param container_name: Name of the container.
:type container_name: str
:param prefix: Prefix of the blob.
:type prefix: str
:param kwargs: Optional keyword arguments that ``ContainerClient.walk_blobs`` takes
:type kwargs: object
:return: True if blobs matching the prefix exist, False otherwise.
:rtype: bool
.. method:: get_blobs_list(self, container_name: str, prefix: Optional[str] = None, include: Optional[List[str]] = None, delimiter: Optional[str] = '/', **kwargs)
List blobs in a given container
:param container_name: The name of the container
:type container_name: str
:param prefix: Filters the results to return only blobs whose names
begin with the specified prefix.
:type prefix: str
:param include: Specifies one or more additional datasets to include in the
response. Options include: ``snapshots``, ``metadata``, ``uncommittedblobs``,
``copy`, ``deleted``.
:type include: List[str]
:param delimiter: filters objects based on the delimiter (for e.g '.csv')
:type delimiter: str
.. method:: load_file(self, file_path: str, container_name: str, blob_name: str, **kwargs)
Upload a file to Azure Blob Storage.
:param file_path: Path to the file to load.
:type file_path: str
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param kwargs: Optional keyword arguments that ``BlobClient.upload_blob()`` takes.
:type kwargs: object
.. method:: load_string(self, string_data: str, container_name: str, blob_name: str, **kwargs)
Upload a string to Azure Blob Storage.
:param string_data: String to load.
:type string_data: str
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param kwargs: Optional keyword arguments that ``BlobClient.upload()`` takes.
:type kwargs: object
.. method:: get_file(self, file_path: str, container_name: str, blob_name: str, **kwargs)
Download a file from Azure Blob Storage.
:param file_path: Path to the file to download.
:type file_path: str
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param kwargs: Optional keyword arguments that `BlobClient.download_blob()` takes.
:type kwargs: object
.. method:: read_file(self, container_name: str, blob_name: str, **kwargs)
Read a file from Azure Blob Storage and return as a string.
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param kwargs: Optional keyword arguments that `BlobClient.download_blob` takes.
:type kwargs: object
.. method:: upload(self, container_name, blob_name, data, blob_type: str = 'BlockBlob', length: Optional[int] = None, **kwargs)
Creates a new blob from a data source with automatic chunking.
:param container_name: The name of the container to upload data
:type container_name: str
:param blob_name: The name of the blob to upload. This need not exist in the container
:type blob_name: str
:param data: The blob data to upload
:param blob_type: The type of the blob. This can be either ``BlockBlob``,
``PageBlob`` or ``AppendBlob``. The default value is ``BlockBlob``.
:type blob_type: storage.BlobType
:param length: Number of bytes to read from the stream. This is optional,
but should be supplied for optimal performance.
:type length: int
.. method:: download(self, container_name, blob_name, offset: Optional[int] = None, length: Optional[int] = None, **kwargs)
Downloads a blob to the StorageStreamDownloader
:param container_name: The name of the container containing the blob
:type container_name: str
:param blob_name: The name of the blob to download
:type blob_name: str
:param offset: Start of byte range to use for downloading a section of the blob.
Must be set if length is provided.
:type offset: int
:param length: Number of bytes to read from the stream.
:type length: int
.. method:: create_container(self, container_name: str)
Create container object if not already existing
:param container_name: The name of the container to create
:type container_name: str
.. method:: delete_container(self, container_name: str)
Delete a container object
:param container_name: The name of the container
:type container_name: str
.. method:: delete_blobs(self, container_name: str, *blobs, **kwargs)
Marks the specified blobs or snapshots for deletion.
:param container_name: The name of the container containing the blobs
:type container_name: str
:param blobs: The blobs to delete. This can be a single blob, or multiple values
can be supplied, where each value is either the name of the blob (str) or BlobProperties.
:type blobs: Union[str, BlobProperties]
.. method:: delete_file(self, container_name: str, blob_name: str, is_prefix: bool = False, ignore_if_missing: bool = False, **kwargs)
Delete a file from Azure Blob Storage.
:param container_name: Name of the container.
:type container_name: str
:param blob_name: Name of the blob.
:type blob_name: str
:param is_prefix: If blob_name is a prefix, delete all matching files
:type is_prefix: bool
:param ignore_if_missing: if True, then return success even if the
blob does not exist.
:type ignore_if_missing: bool
:param kwargs: Optional keyword arguments that ``ContainerClient.delete_blobs()`` takes.
:type kwargs: object