| :mod:`airflow.hooks.webhdfs_hook` |
| ================================= |
| |
| .. py:module:: airflow.hooks.webhdfs_hook |
| |
| |
| Module Contents |
| --------------- |
| |
| .. data:: _kerberos_security_mode |
| |
| |
| |
| |
| .. data:: log |
| |
| |
| |
| |
| .. py:exception:: AirflowWebHDFSHookException |
| |
| Bases: :class:`airflow.exceptions.AirflowException` |
| |
| |
| .. py:class:: WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None) |
| |
| Bases: :class:`airflow.hooks.base_hook.BaseHook` |
| |
| Interact with HDFS. This class is a wrapper around the hdfscli library. |
| |
| :param webhdfs_conn_id: The connection id for the webhdfs client to connect to. |
| :type webhdfs_conn_id: str |
| :param proxy_user: The user used to authenticate. |
| :type proxy_user: str |
| |
| |
| .. method:: get_conn(self) |
| |
| Establishes a connection depending on the security mode set via config or environment variable. |
| |
| :return: a hdfscli InsecureClient or KerberosClient object. |
| :rtype: hdfs.InsecureClient or hdfs.ext.kerberos.KerberosClient |
| |
| |
| |
| |
| .. method:: _get_client(self, connection) |
| |
| |
| |
| |
| .. method:: check_for_path(self, hdfs_path) |
| |
| Check for the existence of a path in HDFS by querying FileStatus. |
| |
| :param hdfs_path: The path to check. |
| :type hdfs_path: str |
| :return: True if the path exists and False if not. |
| :rtype: bool |
| |
| |
| |
| |
| .. method:: load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs) |
| |
| Uploads a file to HDFS. |
| |
| :param source: Local path to file or folder. |
| If it's a folder, all the files inside of it will be uploaded. |
| .. note:: This implies that folders empty of files will not be created remotely. |
| |
| :type source: str |
| :param destination: PTarget HDFS path. |
| If it already exists and is a directory, files will be uploaded inside. |
| :type destination: str |
| :param overwrite: Overwrite any existing file or directory. |
| :type overwrite: bool |
| :param parallelism: Number of threads to use for parallelization. |
| A value of `0` (or negative) uses as many threads as there are files. |
| :type parallelism: int |
| :param \**kwargs: Keyword arguments forwarded to :meth:`hdfs.client.Client.upload`. |
| |
| |
| |
| |