blob: c7b859db50b9731f960048c610a1b4c19583af01 [file] [log] [blame]
:mod:`airflow.hooks.webhdfs_hook`
=================================
.. py:module:: airflow.hooks.webhdfs_hook
Module Contents
---------------
.. data:: _kerberos_security_mode
.. data:: log
.. py:exception:: AirflowWebHDFSHookException
Bases::class:`airflow.exceptions.AirflowException`
.. py:class:: WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None)
Bases::class:`airflow.hooks.base_hook.BaseHook`
Interact with HDFS. This class is a wrapper around the hdfscli library.
.. method:: get_conn(self)
Returns a hdfscli InsecureClient object.
.. method:: check_for_path(self, hdfs_path)
Check for the existence of a path in HDFS by querying FileStatus.
.. method:: load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs)
Uploads a file to HDFS
:param source: Local path to file or folder. If a folder, all the files
inside of it will be uploaded (note that this implies that folders empty
of files will not be created remotely).
:type source: str
:param destination: PTarget HDFS path. If it already exists and is a
directory, files will be uploaded inside.
:type destination: str
:param overwrite: Overwrite any existing file or directory.
:type overwrite: bool
:param parallelism: Number of threads to use for parallelization. A value of
`0` (or negative) uses as many threads as there are files.
:type parallelism: int
:param \*\*kwargs: Keyword arguments forwarded to :meth:`upload`.