blob: 30261012075147c1c5d3497641b6089722ca3be4 [file] [log] [blame]
:py:mod:`airflow.providers.amazon.aws.hooks.emr`
================================================
.. py:module:: airflow.providers.amazon.aws.hooks.emr
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.amazon.aws.hooks.emr.EmrHook
airflow.providers.amazon.aws.hooks.emr.EmrContainerHook
.. py:class:: EmrHook(emr_conn_id = default_conn_name, *args, **kwargs)
Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
Interact with AWS EMR. emr_conn_id is only necessary for using the
create_job_flow method.
Additional arguments (such as ``aws_conn_id``) may be specified and
are passed down to the underlying AwsBaseHook.
.. seealso::
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
.. py:attribute:: conn_name_attr
:annotation: = emr_conn_id
.. py:attribute:: default_conn_name
:annotation: = emr_default
.. py:attribute:: conn_type
:annotation: = emr
.. py:attribute:: hook_name
:annotation: = Amazon Elastic MapReduce
.. py:method:: get_cluster_id_by_name(self, emr_cluster_name, cluster_states)
Fetch id of EMR cluster with given name and (optional) states.
Will return only if single id is found.
:param emr_cluster_name: Name of a cluster to find
:param cluster_states: State(s) of cluster to find
:return: id of the EMR cluster
.. py:method:: create_job_flow(self, job_flow_overrides)
Creates a job flow using the config from the EMR connection.
Keys of the json extra hash may have the arguments of the boto3
run_job_flow method.
Overrides for this config may be passed as the job_flow_overrides.
.. py:class:: EmrContainerHook(*args, virtual_cluster_id = None, **kwargs)
Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status
Additional arguments (such as ``aws_conn_id``) may be specified and
are passed down to the underlying AwsBaseHook.
.. seealso::
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
:param virtual_cluster_id: Cluster ID of the EMR on EKS virtual cluster
.. py:attribute:: INTERMEDIATE_STATES
:annotation: = ['PENDING', 'SUBMITTED', 'RUNNING']
.. py:attribute:: FAILURE_STATES
:annotation: = ['FAILED', 'CANCELLED', 'CANCEL_PENDING']
.. py:attribute:: SUCCESS_STATES
:annotation: = ['COMPLETED']
.. py:attribute:: TERMINAL_STATES
:annotation: = ['COMPLETED', 'FAILED', 'CANCELLED', 'CANCEL_PENDING']
.. py:method:: submit_job(self, name, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, tags = None)
Submit a job to the EMR Containers API and return the job ID.
A job run is a unit of work, such as a Spark jar, PySpark script,
or SparkSQL query, that you submit to Amazon EMR on EKS.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.start_job_run # noqa: E501
:param name: The name of the job run.
:param execution_role_arn: The IAM role ARN associated with the job run.
:param release_label: The Amazon EMR release version to use for the job run.
:param job_driver: Job configuration details, e.g. the Spark job parameters.
:param configuration_overrides: The configuration overrides for the job run,
specifically either application configuration or monitoring configuration.
:param client_request_token: The client idempotency token of the job run request.
Use this if you want to specify a unique ID to prevent two jobs from getting started.
:param tags: The tags assigned to job runs.
:return: Job ID
.. py:method:: get_job_failure_reason(self, job_id)
Fetch the reason for a job failure (e.g. error message). Returns None or reason string.
:param job_id: Id of submitted job run
:return: str
.. py:method:: check_query_status(self, job_id)
Fetch the status of submitted job run. Returns None or one of valid query states.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.describe_job_run # noqa: E501
:param job_id: Id of submitted job run
:return: str
.. py:method:: poll_query_status(self, job_id, max_tries = None, poll_interval = 30)
Poll the status of submitted job run until query state reaches final state.
Returns one of the final states.
:param job_id: Id of submitted job run
:param max_tries: Number of times to poll for query state before function exits
:param poll_interval: Time (in seconds) to wait between calls to check query status on EMR
:return: str
.. py:method:: stop_query(self, job_id)
Cancel the submitted job_run
:param job_id: Id of submitted job_run
:return: dict