blob: 438e07f3d465efdb94adfb50cbb52846ab481d96 [file] [log] [blame]
:mod:`airflow.contrib.hooks.gcp_dataproc_hook`
==============================================
.. py:module:: airflow.contrib.hooks.gcp_dataproc_hook
Module Contents
---------------
.. py:class:: _DataProcJob(dataproc_api, project_id, job, region='global', job_error_states=None, num_retries=5)
Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin`
.. method:: wait_for_done(self)
.. method:: raise_error(self, message=None)
.. method:: get(self)
.. py:class:: _DataProcJobBuilder(project_id, task_id, cluster_name, job_type, properties)
.. method:: add_labels(self, labels)
Set labels for Dataproc job.
:param labels: Labels for the job query.
:type labels: dict
.. method:: add_variables(self, variables)
.. method:: add_args(self, args)
.. method:: add_query(self, query)
.. method:: add_query_uri(self, query_uri)
.. method:: add_jar_file_uris(self, jars)
.. method:: add_archive_uris(self, archives)
.. method:: add_file_uris(self, files)
.. method:: add_python_file_uris(self, pyfiles)
.. method:: set_main(self, main_jar, main_class)
.. method:: set_python_main(self, main)
.. method:: set_job_name(self, name)
.. method:: build(self)
.. py:class:: _DataProcOperation(dataproc_api, operation, num_retries)
Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin`
Continuously polls Dataproc Operation until it completes.
.. method:: wait_for_done(self)
.. method:: get(self)
.. method:: _check_done(self)
.. method:: _raise_error(self)
.. py:class:: DataProcHook(gcp_conn_id='google_cloud_default', delegate_to=None, api_version='v1beta2')
Bases: :class:`airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook`
Hook for Google Cloud Dataproc APIs.
.. method:: get_conn(self)
Returns a Google Cloud Dataproc service object.
.. method:: get_cluster(self, project_id, region, cluster_name)
.. method:: submit(self, project_id, job, region='global', job_error_states=None)
.. method:: create_job_template(self, task_id, cluster_name, job_type, properties)
.. method:: wait(self, operation)
Awaits for Google Cloud Dataproc Operation to complete.
.. method:: cancel(self, project_id, job_id, region='global')
Cancel a Google Cloud DataProc job.
:param project_id: Name of the project the job belongs to
:type project_id: str
:param job_id: Identifier of the job to cancel
:type job_id: int
:param region: Region used for the job
:type region: str
:returns A Job json dictionary representing the canceled job