| :mod:`airflow.contrib.hooks.gcp_dataproc_hook` |
| ============================================== |
| |
| .. py:module:: airflow.contrib.hooks.gcp_dataproc_hook |
| |
| |
| Module Contents |
| --------------- |
| |
| .. py:class:: _DataProcJob(dataproc_api, project_id, job, region='global', job_error_states=None, num_retries=5) |
| |
| Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin` |
| |
| |
| .. method:: wait_for_done(self) |
| |
| |
| |
| |
| .. method:: raise_error(self, message=None) |
| |
| |
| |
| |
| .. method:: get(self) |
| |
| |
| |
| |
| .. py:class:: _DataProcJobBuilder(project_id, task_id, cluster_name, job_type, properties) |
| |
| |
| .. method:: add_labels(self, labels) |
| |
| Set labels for Dataproc job. |
| |
| :param labels: Labels for the job query. |
| :type labels: dict |
| |
| |
| |
| |
| .. method:: add_variables(self, variables) |
| |
| |
| |
| |
| .. method:: add_args(self, args) |
| |
| |
| |
| |
| .. method:: add_query(self, query) |
| |
| |
| |
| |
| .. method:: add_query_uri(self, query_uri) |
| |
| |
| |
| |
| .. method:: add_jar_file_uris(self, jars) |
| |
| |
| |
| |
| .. method:: add_archive_uris(self, archives) |
| |
| |
| |
| |
| .. method:: add_file_uris(self, files) |
| |
| |
| |
| |
| .. method:: add_python_file_uris(self, pyfiles) |
| |
| |
| |
| |
| .. method:: set_main(self, main_jar, main_class) |
| |
| |
| |
| |
| .. method:: set_python_main(self, main) |
| |
| |
| |
| |
| .. method:: set_job_name(self, name) |
| |
| |
| |
| |
| .. method:: build(self) |
| |
| |
| |
| |
| .. py:class:: _DataProcOperation(dataproc_api, operation, num_retries) |
| |
| Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin` |
| |
| Continuously polls Dataproc Operation until it completes. |
| |
| |
| .. method:: wait_for_done(self) |
| |
| |
| |
| |
| .. method:: get(self) |
| |
| |
| |
| |
| .. method:: _check_done(self) |
| |
| |
| |
| |
| .. method:: _raise_error(self) |
| |
| |
| |
| |
| .. py:class:: DataProcHook(gcp_conn_id='google_cloud_default', delegate_to=None, api_version='v1beta2') |
| |
| Bases: :class:`airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook` |
| |
| Hook for Google Cloud Dataproc APIs. |
| |
| |
| .. method:: get_conn(self) |
| |
| Returns a Google Cloud Dataproc service object. |
| |
| |
| |
| |
| .. method:: get_cluster(self, project_id, region, cluster_name) |
| |
| |
| |
| |
| .. method:: submit(self, project_id, job, region='global', job_error_states=None) |
| |
| |
| |
| |
| .. method:: create_job_template(self, task_id, cluster_name, job_type, properties) |
| |
| |
| |
| |
| .. method:: wait(self, operation) |
| |
| Awaits for Google Cloud Dataproc Operation to complete. |
| |
| |
| |
| |
| .. method:: cancel(self, project_id, job_id, region='global') |
| |
| Cancel a Google Cloud DataProc job. |
| :param project_id: Name of the project the job belongs to |
| :type project_id: str |
| :param job_id: Identifier of the job to cancel |
| :type job_id: int |
| :param region: Region used for the job |
| :type region: str |
| :returns A Job json dictionary representing the canceled job |
| |
| |
| |
| |