| |
| |
| :mod:`airflow.hooks.webhdfs_hook` |
| ================================= |
| |
| .. py:module:: airflow.hooks.webhdfs_hook |
| |
| |
| |
| |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| |
| |
| |
| |
| |
| .. data:: _kerberos_security_mode |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. data:: log |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. py:exception:: AirflowWebHDFSHookException |
| |
| Bases::class:`airflow.exceptions.AirflowException` |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. py:class:: WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None) |
| |
| Bases::class:`airflow.hooks.base_hook.BaseHook` |
| |
| |
| |
| Interact with HDFS. This class is a wrapper around the hdfscli library. |
| |
| |
| |
| |
| |
| |
| |
| |
| .. method:: get_conn(self) |
| |
| |
| Returns a hdfscli InsecureClient object. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: check_for_path(self, hdfs_path) |
| |
| |
| Check for the existence of a path in HDFS by querying FileStatus. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs) |
| |
| |
| Uploads a file to HDFS |
| |
| :param source: Local path to file or folder. If a folder, all the files |
| inside of it will be uploaded (note that this implies that folders empty |
| of files will not be created remotely). |
| :type source: str |
| :param destination: PTarget HDFS path. If it already exists and is a |
| directory, files will be uploaded inside. |
| :type destination: str |
| :param overwrite: Overwrite any existing file or directory. |
| :type overwrite: bool |
| :param parallelism: Number of threads to use for parallelization. A value of |
| `0` (or negative) uses as many threads as there are files. |
| :type parallelism: int |
| :param \*\*kwargs: Keyword arguments forwarded to :meth:`upload`. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |