| :py:mod:`airflow.providers.apache.hdfs.hooks.webhdfs` |
| ===================================================== |
| |
| .. py:module:: airflow.providers.apache.hdfs.hooks.webhdfs |
| |
| .. autoapi-nested-parse:: |
| |
| Hook for Web HDFS |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.hdfs.hooks.webhdfs.WebHDFSHook |
| |
| |
| |
| |
| Attributes |
| ~~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.hdfs.hooks.webhdfs.log |
| |
| |
| .. py:data:: log |
| |
| |
| |
| |
| .. py:exception:: AirflowWebHDFSHookException |
| |
| Bases: :py:obj:`airflow.exceptions.AirflowException` |
| |
| Exception specific for WebHDFS hook |
| |
| |
| .. py:class:: WebHDFSHook(webhdfs_conn_id = 'webhdfs_default', proxy_user = None) |
| |
| Bases: :py:obj:`airflow.hooks.base.BaseHook` |
| |
| Interact with HDFS. This class is a wrapper around the hdfscli library. |
| |
| :param webhdfs_conn_id: The connection id for the webhdfs client to connect to. |
| :param proxy_user: The user used to authenticate. |
| |
| .. py:method:: get_conn(self) |
| |
| Establishes a connection depending on the security mode set via config or environment variable. |
| :return: a hdfscli InsecureClient or KerberosClient object. |
| :rtype: hdfs.InsecureClient or hdfs.ext.kerberos.KerberosClient |
| |
| |
| .. py:method:: check_for_path(self, hdfs_path) |
| |
| Check for the existence of a path in HDFS by querying FileStatus. |
| |
| :param hdfs_path: The path to check. |
| :return: True if the path exists and False if not. |
| :rtype: bool |
| |
| |
| .. py:method:: load_file(self, source, destination, overwrite = True, parallelism = 1, **kwargs) |
| |
| Uploads a file to HDFS. |
| |
| :param source: Local path to file or folder. |
| If it's a folder, all the files inside of it will be uploaded. |
| .. note:: This implies that folders empty of files will not be created remotely. |
| |
| :param destination: PTarget HDFS path. |
| If it already exists and is a directory, files will be uploaded inside. |
| :param overwrite: Overwrite any existing file or directory. |
| :param parallelism: Number of threads to use for parallelization. |
| A value of `0` (or negative) uses as many threads as there are files. |
| :param kwargs: Keyword arguments forwarded to :meth:`hdfs.client.Client.upload`. |
| |
| |
| |