docs-archive/apache-airflow-providers-amazon/4.0.0/_sources/_api/airflow/providers/amazon/aws/hooks/emr/index.rst.txt - airflow-site - Git at Google

 :py:mod:`airflow.providers.amazon.aws.hooks.emr`
 ================================================

 .. py:module:: airflow.providers.amazon.aws.hooks.emr


 Module Contents
 ---------------

 Classes
 ~~~~~~~

 .. autoapisummary::

    airflow.providers.amazon.aws.hooks.emr.EmrHook
    airflow.providers.amazon.aws.hooks.emr.EmrContainerHook


 .. py:class:: EmrHook(emr_conn_id = default_conn_name, *args, **kwargs)

    Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

    Interact with AWS EMR. emr_conn_id is only necessary for using the
    create_job_flow method.

    Additional arguments (such as ``aws_conn_id``) may be specified and
    are passed down to the underlying AwsBaseHook.

    .. seealso::
        :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

    .. py:attribute:: conn_name_attr
       :annotation: = emr_conn_id


    .. py:attribute:: default_conn_name
       :annotation: = emr_default


    .. py:attribute:: conn_type
       :annotation: = emr


    .. py:attribute:: hook_name
       :annotation: = Amazon Elastic MapReduce


    .. py:method:: get_cluster_id_by_name(self, emr_cluster_name, cluster_states)

       Fetch id of EMR cluster with given name and (optional) states.
       Will return only if single id is found.

       :param emr_cluster_name: Name of a cluster to find
       :param cluster_states: State(s) of cluster to find
       :return: id of the EMR cluster


    .. py:method:: create_job_flow(self, job_flow_overrides)

       Creates a job flow using the config from the EMR connection.
       Keys of the json extra hash may have the arguments of the boto3
       run_job_flow method.
       Overrides for this config may be passed as the job_flow_overrides.


 .. py:class:: EmrContainerHook(*args, virtual_cluster_id = None, **kwargs)

    Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

    Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status
    Additional arguments (such as ``aws_conn_id``) may be specified and
    are passed down to the underlying AwsBaseHook.

    .. seealso::
        :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

    :param virtual_cluster_id: Cluster ID of the EMR on EKS virtual cluster

    .. py:attribute:: INTERMEDIATE_STATES
       :annotation: = ['PENDING', 'SUBMITTED', 'RUNNING']


    .. py:attribute:: FAILURE_STATES
       :annotation: = ['FAILED', 'CANCELLED', 'CANCEL_PENDING']


    .. py:attribute:: SUCCESS_STATES
       :annotation: = ['COMPLETED']


    .. py:attribute:: TERMINAL_STATES
       :annotation: = ['COMPLETED', 'FAILED', 'CANCELLED', 'CANCEL_PENDING']


    .. py:method:: submit_job(self, name, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, tags = None)

       Submit a job to the EMR Containers API and return the job ID.
       A job run is a unit of work, such as a Spark jar, PySpark script,
       or SparkSQL query, that you submit to Amazon EMR on EKS.
       See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.start_job_run  # noqa: E501

       :param name: The name of the job run.
       :param execution_role_arn: The IAM role ARN associated with the job run.
       :param release_label: The Amazon EMR release version to use for the job run.
       :param job_driver: Job configuration details, e.g. the Spark job parameters.
       :param configuration_overrides: The configuration overrides for the job run,
           specifically either application configuration or monitoring configuration.
       :param client_request_token: The client idempotency token of the job run request.
           Use this if you want to specify a unique ID to prevent two jobs from getting started.
       :param tags: The tags assigned to job runs.
       :return: Job ID


    .. py:method:: get_job_failure_reason(self, job_id)

       Fetch the reason for a job failure (e.g. error message). Returns None or reason string.

       :param job_id: Id of submitted job run
       :return: str


    .. py:method:: check_query_status(self, job_id)

       Fetch the status of submitted job run. Returns None or one of valid query states.
       See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.describe_job_run  # noqa: E501
       :param job_id: Id of submitted job run
       :return: str


    .. py:method:: poll_query_status(self, job_id, max_tries = None, poll_interval = 30)

       Poll the status of submitted job run until query state reaches final state.
       Returns one of the final states.

       :param job_id: Id of submitted job run
       :param max_tries: Number of times to poll for query state before function exits
       :param poll_interval: Time (in seconds) to wait between calls to check query status on EMR
       :return: str


    .. py:method:: stop_query(self, job_id)

       Cancel the submitted job_run

       :param job_id: Id of submitted job_run
       :return: dict
	:py:mod:`airflow.providers.amazon.aws.hooks.emr`
	================================================

	.. py:module:: airflow.providers.amazon.aws.hooks.emr


	Module Contents
	---------------

	Classes
	~~~~~~~

	.. autoapisummary::

	airflow.providers.amazon.aws.hooks.emr.EmrHook
	airflow.providers.amazon.aws.hooks.emr.EmrContainerHook




	.. py:class:: EmrHook(emr_conn_id = default_conn_name, args, *kwargs)

	Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

	Interact with AWS EMR. emr_conn_id is only necessary for using the
	create_job_flow method.

	Additional arguments (such as ``aws_conn_id``) may be specified and
	are passed down to the underlying AwsBaseHook.

	.. seealso::
	:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

	.. py:attribute:: conn_name_attr
	:annotation: = emr_conn_id



	.. py:attribute:: default_conn_name
	:annotation: = emr_default



	.. py:attribute:: conn_type
	:annotation: = emr



	.. py:attribute:: hook_name
	:annotation: = Amazon Elastic MapReduce



	.. py:method:: get_cluster_id_by_name(self, emr_cluster_name, cluster_states)

	Fetch id of EMR cluster with given name and (optional) states.
	Will return only if single id is found.

	:param emr_cluster_name: Name of a cluster to find
	:param cluster_states: State(s) of cluster to find
	:return: id of the EMR cluster


	.. py:method:: create_job_flow(self, job_flow_overrides)

	Creates a job flow using the config from the EMR connection.
	Keys of the json extra hash may have the arguments of the boto3
	run_job_flow method.
	Overrides for this config may be passed as the job_flow_overrides.



	.. py:class:: EmrContainerHook(args, virtual_cluster_id = None, *kwargs)

	Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

	Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status
	Additional arguments (such as ``aws_conn_id``) may be specified and
	are passed down to the underlying AwsBaseHook.

	.. seealso::
	:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

	:param virtual_cluster_id: Cluster ID of the EMR on EKS virtual cluster

	.. py:attribute:: INTERMEDIATE_STATES
	:annotation: = ['PENDING', 'SUBMITTED', 'RUNNING']



	.. py:attribute:: FAILURE_STATES
	:annotation: = ['FAILED', 'CANCELLED', 'CANCEL_PENDING']



	.. py:attribute:: SUCCESS_STATES
	:annotation: = ['COMPLETED']



	.. py:attribute:: TERMINAL_STATES
	:annotation: = ['COMPLETED', 'FAILED', 'CANCELLED', 'CANCEL_PENDING']



	.. py:method:: submit_job(self, name, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, tags = None)

	Submit a job to the EMR Containers API and return the job ID.
	A job run is a unit of work, such as a Spark jar, PySpark script,
	or SparkSQL query, that you submit to Amazon EMR on EKS.
	See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.start_job_run # noqa: E501

	:param name: The name of the job run.
	:param execution_role_arn: The IAM role ARN associated with the job run.
	:param release_label: The Amazon EMR release version to use for the job run.
	:param job_driver: Job configuration details, e.g. the Spark job parameters.
	:param configuration_overrides: The configuration overrides for the job run,
	specifically either application configuration or monitoring configuration.
	:param client_request_token: The client idempotency token of the job run request.
	Use this if you want to specify a unique ID to prevent two jobs from getting started.
	:param tags: The tags assigned to job runs.
	:return: Job ID


	.. py:method:: get_job_failure_reason(self, job_id)

	Fetch the reason for a job failure (e.g. error message). Returns None or reason string.

	:param job_id: Id of submitted job run
	:return: str


	.. py:method:: check_query_status(self, job_id)

	Fetch the status of submitted job run. Returns None or one of valid query states.
	See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.describe_job_run # noqa: E501
	:param job_id: Id of submitted job run
	:return: str


	.. py:method:: poll_query_status(self, job_id, max_tries = None, poll_interval = 30)

	Poll the status of submitted job run until query state reaches final state.
	Returns one of the final states.

	:param job_id: Id of submitted job run
	:param max_tries: Number of times to poll for query state before function exits
	:param poll_interval: Time (in seconds) to wait between calls to check query status on EMR
	:return: str


	.. py:method:: stop_query(self, job_id)

	Cancel the submitted job_run

	:param job_id: Id of submitted job_run
	:return: dict