docs-archive/apache-airflow-providers-amazon/3.4.0/_sources/_api/airflow/providers/amazon/aws/operators/emr/index.rst.txt - airflow-site - Git at Google

 :py:mod:`airflow.providers.amazon.aws.operators.emr`
 ====================================================

 .. py:module:: airflow.providers.amazon.aws.operators.emr


 Module Contents
 ---------------

 Classes
 ~~~~~~~

 .. autoapisummary::

    airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator
    airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
    airflow.providers.amazon.aws.operators.emr.EmrClusterLink
    airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator
    airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator
    airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator


 .. py:class:: EmrAddStepsOperator(*, job_flow_id = None, job_flow_name = None, cluster_states = None, aws_conn_id = 'aws_default', steps = None, **kwargs)

    Bases: :py:obj:`airflow.models.BaseOperator`

    An operator that adds steps to an existing EMR job_flow.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:EmrAddStepsOperator`

    :param job_flow_id: id of the JobFlow to add steps to. (templated)
    :param job_flow_name: name of the JobFlow to add steps to. Use as an alternative to passing
        job_flow_id. will search for id of JobFlow with matching name in one of the states in
        param cluster_states. Exactly one cluster like this should exist or will fail. (templated)
    :param cluster_states: Acceptable cluster states when searching for JobFlow id by job_flow_name.
        (templated)
    :param aws_conn_id: aws connection to uses
    :param steps: boto3 style steps or reference to a steps file (must be '.json') to
        be added to the jobflow. (templated)
    :param do_xcom_push: if True, job_flow_id is pushed to XCom with key job_flow_id.

    .. py:attribute:: template_fields
       :annotation: :Sequence[str] = ['job_flow_id', 'job_flow_name', 'cluster_states', 'steps']


    .. py:attribute:: template_ext
       :annotation: :Sequence[str] = ['.json']


    .. py:attribute:: template_fields_renderers


    .. py:attribute:: ui_color
       :annotation: = #f9c915


    .. py:method:: execute(self, context)

       This is the main method to derive when creating an operator.
       Context is the same dictionary used as when rendering jinja templates.

       Refer to get_template_context for more context.


 .. py:class:: EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, aws_conn_id = 'aws_default', poll_interval = 30, max_tries = None, **kwargs)

    Bases: :py:obj:`airflow.models.BaseOperator`

    An operator that submits jobs to EMR on EKS virtual clusters.

    :param name: The name of the job run.
    :param virtual_cluster_id: The EMR on EKS virtual cluster ID
    :param execution_role_arn: The IAM role ARN associated with the job run.
    :param release_label: The Amazon EMR release version to use for the job run.
    :param job_driver: Job configuration details, e.g. the Spark job parameters.
    :param configuration_overrides: The configuration overrides for the job run,
        specifically either application configuration or monitoring configuration.
    :param client_request_token: The client idempotency token of the job run request.
        Use this if you want to specify a unique ID to prevent two jobs from getting started.
        If no token is provided, a UUIDv4 token will be generated for you.
    :param aws_conn_id: The Airflow connection used for AWS credentials.
    :param poll_interval: Time (in seconds) to wait between two consecutive calls to check query status on EMR
    :param max_tries: Maximum number of times to wait for the job run to finish.
        Defaults to None, which will poll until the job is *not* in a pending, submitted, or running state.

    .. py:attribute:: template_fields
       :annotation: :Sequence[str] = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver']


    .. py:attribute:: ui_color
       :annotation: = #f9c915


    .. py:method:: hook(self)

       Create and return an EmrContainerHook.


    .. py:method:: execute(self, context)

       Run job on EMR Containers


    .. py:method:: on_kill(self)

       Cancel the submitted job run


 .. py:class:: EmrClusterLink

    Bases: :py:obj:`airflow.models.BaseOperatorLink`

    Operator link for EmrCreateJobFlowOperator. It allows users to access the EMR Cluster

    .. py:attribute:: name
       :annotation: = EMR Cluster


    .. py:method:: get_link(self, operator, dttm = None, ti_key = None)

       Get link to EMR cluster.

       :param operator: operator
       :param dttm: datetime
       :return: url link


 .. py:class:: EmrCreateJobFlowOperator(*, aws_conn_id = 'aws_default', emr_conn_id = 'emr_default', job_flow_overrides = None, region_name = None, **kwargs)

    Bases: :py:obj:`airflow.models.BaseOperator`

    Creates an EMR JobFlow, reading the config from the EMR connection.
    A dictionary of JobFlow overrides can be passed that override
    the config from the connection.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:EmrCreateJobFlowOperator`

    :param aws_conn_id: aws connection to uses
    :param emr_conn_id: emr connection to use
    :param job_flow_overrides: boto3 style arguments or reference to an arguments file
        (must be '.json') to override emr_connection extra. (templated)
    :param region_name: Region named passed to EmrHook

    .. py:attribute:: template_fields
       :annotation: :Sequence[str] = ['job_flow_overrides']


    .. py:attribute:: template_ext
       :annotation: :Sequence[str] = ['.json']


    .. py:attribute:: template_fields_renderers


    .. py:attribute:: ui_color
       :annotation: = #f9c915


    .. py:attribute:: operator_extra_links


    .. py:method:: execute(self, context)

       This is the main method to derive when creating an operator.
       Context is the same dictionary used as when rendering jinja templates.

       Refer to get_template_context for more context.


 .. py:class:: EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id = 'aws_default', **kwargs)

    Bases: :py:obj:`airflow.models.BaseOperator`

    An operator that modifies an existing EMR cluster.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:EmrModifyClusterOperator`

    :param cluster_id: cluster identifier
    :param step_concurrency_level: Concurrency of the cluster
    :param aws_conn_id: aws connection to uses
    :param do_xcom_push: if True, cluster_id is pushed to XCom with key cluster_id.

    .. py:attribute:: template_fields
       :annotation: :Sequence[str] = ['cluster_id', 'step_concurrency_level']


    .. py:attribute:: template_ext
       :annotation: :Sequence[str] = []


    .. py:attribute:: ui_color
       :annotation: = #f9c915


    .. py:method:: execute(self, context)

       This is the main method to derive when creating an operator.
       Context is the same dictionary used as when rendering jinja templates.

       Refer to get_template_context for more context.


 .. py:class:: EmrTerminateJobFlowOperator(*, job_flow_id, aws_conn_id = 'aws_default', **kwargs)

    Bases: :py:obj:`airflow.models.BaseOperator`

    Operator to terminate EMR JobFlows.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:EmrTerminateJobFlowOperator`

    :param job_flow_id: id of the JobFlow to terminate. (templated)
    :param aws_conn_id: aws connection to uses

    .. py:attribute:: template_fields
       :annotation: :Sequence[str] = ['job_flow_id']


    .. py:attribute:: template_ext
       :annotation: :Sequence[str] = []


    .. py:attribute:: ui_color
       :annotation: = #f9c915


    .. py:method:: execute(self, context)

       This is the main method to derive when creating an operator.
       Context is the same dictionary used as when rendering jinja templates.

       Refer to get_template_context for more context.
	:py:mod:`airflow.providers.amazon.aws.operators.emr`
	====================================================

	.. py:module:: airflow.providers.amazon.aws.operators.emr


	Module Contents
	---------------

	Classes
	~~~~~~~

	.. autoapisummary::

	airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator
	airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
	airflow.providers.amazon.aws.operators.emr.EmrClusterLink
	airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator
	airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator
	airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator




	.. py:class:: EmrAddStepsOperator(, job_flow_id = None, job_flow_name = None, cluster_states = None, aws_conn_id = 'aws_default', steps = None, *kwargs)

	Bases: :py:obj:`airflow.models.BaseOperator`

	An operator that adds steps to an existing EMR job_flow.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:EmrAddStepsOperator`

	:param job_flow_id: id of the JobFlow to add steps to. (templated)
	:param job_flow_name: name of the JobFlow to add steps to. Use as an alternative to passing
	job_flow_id. will search for id of JobFlow with matching name in one of the states in
	param cluster_states. Exactly one cluster like this should exist or will fail. (templated)
	:param cluster_states: Acceptable cluster states when searching for JobFlow id by job_flow_name.
	(templated)
	:param aws_conn_id: aws connection to uses
	:param steps: boto3 style steps or reference to a steps file (must be '.json') to
	be added to the jobflow. (templated)
	:param do_xcom_push: if True, job_flow_id is pushed to XCom with key job_flow_id.

	.. py:attribute:: template_fields
	:annotation: :Sequence[str] = ['job_flow_id', 'job_flow_name', 'cluster_states', 'steps']



	.. py:attribute:: template_ext
	:annotation: :Sequence[str] = ['.json']



	.. py:attribute:: template_fields_renderers




	.. py:attribute:: ui_color
	:annotation: = #f9c915



	.. py:method:: execute(self, context)

	This is the main method to derive when creating an operator.
	Context is the same dictionary used as when rendering jinja templates.

	Refer to get_template_context for more context.



	.. py:class:: EmrContainerOperator(, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, aws_conn_id = 'aws_default', poll_interval = 30, max_tries = None, *kwargs)

	Bases: :py:obj:`airflow.models.BaseOperator`

	An operator that submits jobs to EMR on EKS virtual clusters.

	:param name: The name of the job run.
	:param virtual_cluster_id: The EMR on EKS virtual cluster ID
	:param execution_role_arn: The IAM role ARN associated with the job run.
	:param release_label: The Amazon EMR release version to use for the job run.
	:param job_driver: Job configuration details, e.g. the Spark job parameters.
	:param configuration_overrides: The configuration overrides for the job run,
	specifically either application configuration or monitoring configuration.
	:param client_request_token: The client idempotency token of the job run request.
	Use this if you want to specify a unique ID to prevent two jobs from getting started.
	If no token is provided, a UUIDv4 token will be generated for you.
	:param aws_conn_id: The Airflow connection used for AWS credentials.
	:param poll_interval: Time (in seconds) to wait between two consecutive calls to check query status on EMR
	:param max_tries: Maximum number of times to wait for the job run to finish.
	Defaults to None, which will poll until the job is not in a pending, submitted, or running state.

	.. py:attribute:: template_fields
	:annotation: :Sequence[str] = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver']



	.. py:attribute:: ui_color
	:annotation: = #f9c915



	.. py:method:: hook(self)

	Create and return an EmrContainerHook.


	.. py:method:: execute(self, context)

	Run job on EMR Containers


	.. py:method:: on_kill(self)

	Cancel the submitted job run



	.. py:class:: EmrClusterLink

	Bases: :py:obj:`airflow.models.BaseOperatorLink`

	Operator link for EmrCreateJobFlowOperator. It allows users to access the EMR Cluster

	.. py:attribute:: name
	:annotation: = EMR Cluster



	.. py:method:: get_link(self, operator, dttm = None, ti_key = None)

	Get link to EMR cluster.

	:param operator: operator
	:param dttm: datetime
	:return: url link



	.. py:class:: EmrCreateJobFlowOperator(, aws_conn_id = 'aws_default', emr_conn_id = 'emr_default', job_flow_overrides = None, region_name = None, *kwargs)

	Bases: :py:obj:`airflow.models.BaseOperator`

	Creates an EMR JobFlow, reading the config from the EMR connection.
	A dictionary of JobFlow overrides can be passed that override
	the config from the connection.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:EmrCreateJobFlowOperator`

	:param aws_conn_id: aws connection to uses
	:param emr_conn_id: emr connection to use
	:param job_flow_overrides: boto3 style arguments or reference to an arguments file
	(must be '.json') to override emr_connection extra. (templated)
	:param region_name: Region named passed to EmrHook

	.. py:attribute:: template_fields
	:annotation: :Sequence[str] = ['job_flow_overrides']



	.. py:attribute:: template_ext
	:annotation: :Sequence[str] = ['.json']



	.. py:attribute:: template_fields_renderers




	.. py:attribute:: ui_color
	:annotation: = #f9c915



	.. py:attribute:: operator_extra_links




	.. py:method:: execute(self, context)

	This is the main method to derive when creating an operator.
	Context is the same dictionary used as when rendering jinja templates.

	Refer to get_template_context for more context.



	.. py:class:: EmrModifyClusterOperator(, cluster_id, step_concurrency_level, aws_conn_id = 'aws_default', *kwargs)

	Bases: :py:obj:`airflow.models.BaseOperator`

	An operator that modifies an existing EMR cluster.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:EmrModifyClusterOperator`

	:param cluster_id: cluster identifier
	:param step_concurrency_level: Concurrency of the cluster
	:param aws_conn_id: aws connection to uses
	:param do_xcom_push: if True, cluster_id is pushed to XCom with key cluster_id.

	.. py:attribute:: template_fields
	:annotation: :Sequence[str] = ['cluster_id', 'step_concurrency_level']



	.. py:attribute:: template_ext
	:annotation: :Sequence[str] = []



	.. py:attribute:: ui_color
	:annotation: = #f9c915



	.. py:method:: execute(self, context)

	This is the main method to derive when creating an operator.
	Context is the same dictionary used as when rendering jinja templates.

	Refer to get_template_context for more context.



	.. py:class:: EmrTerminateJobFlowOperator(, job_flow_id, aws_conn_id = 'aws_default', *kwargs)

	Bases: :py:obj:`airflow.models.BaseOperator`

	Operator to terminate EMR JobFlows.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:EmrTerminateJobFlowOperator`

	:param job_flow_id: id of the JobFlow to terminate. (templated)
	:param aws_conn_id: aws connection to uses

	.. py:attribute:: template_fields
	:annotation: :Sequence[str] = ['job_flow_id']



	.. py:attribute:: template_ext
	:annotation: :Sequence[str] = []



	.. py:attribute:: ui_color
	:annotation: = #f9c915



	.. py:method:: execute(self, context)

	This is the main method to derive when creating an operator.
	Context is the same dictionary used as when rendering jinja templates.

	Refer to get_template_context for more context.