blob: 15383501517bed904ec15a5d666aa1911578d37e [file] [log] [blame]
:py:mod:`airflow.providers.amazon.aws.operators.emr`
====================================================
.. py:module:: airflow.providers.amazon.aws.operators.emr
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator
airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
airflow.providers.amazon.aws.operators.emr.EmrClusterLink
airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator
airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator
airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator
.. py:class:: EmrAddStepsOperator(*, job_flow_id = None, job_flow_name = None, cluster_states = None, aws_conn_id = 'aws_default', steps = None, **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
An operator that adds steps to an existing EMR job_flow.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:EmrAddStepsOperator`
:param job_flow_id: id of the JobFlow to add steps to. (templated)
:param job_flow_name: name of the JobFlow to add steps to. Use as an alternative to passing
job_flow_id. will search for id of JobFlow with matching name in one of the states in
param cluster_states. Exactly one cluster like this should exist or will fail. (templated)
:param cluster_states: Acceptable cluster states when searching for JobFlow id by job_flow_name.
(templated)
:param aws_conn_id: aws connection to uses
:param steps: boto3 style steps or reference to a steps file (must be '.json') to
be added to the jobflow. (templated)
:param do_xcom_push: if True, job_flow_id is pushed to XCom with key job_flow_id.
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['job_flow_id', 'job_flow_name', 'cluster_states', 'steps']
.. py:attribute:: template_ext
:annotation: :Sequence[str] = ['.json']
.. py:attribute:: template_fields_renderers
.. py:attribute:: ui_color
:annotation: = #f9c915
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
.. py:class:: EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides = None, client_request_token = None, aws_conn_id = 'aws_default', poll_interval = 30, max_tries = None, **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
An operator that submits jobs to EMR on EKS virtual clusters.
:param name: The name of the job run.
:param virtual_cluster_id: The EMR on EKS virtual cluster ID
:param execution_role_arn: The IAM role ARN associated with the job run.
:param release_label: The Amazon EMR release version to use for the job run.
:param job_driver: Job configuration details, e.g. the Spark job parameters.
:param configuration_overrides: The configuration overrides for the job run,
specifically either application configuration or monitoring configuration.
:param client_request_token: The client idempotency token of the job run request.
Use this if you want to specify a unique ID to prevent two jobs from getting started.
If no token is provided, a UUIDv4 token will be generated for you.
:param aws_conn_id: The Airflow connection used for AWS credentials.
:param poll_interval: Time (in seconds) to wait between two consecutive calls to check query status on EMR
:param max_tries: Maximum number of times to wait for the job run to finish.
Defaults to None, which will poll until the job is *not* in a pending, submitted, or running state.
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver']
.. py:attribute:: ui_color
:annotation: = #f9c915
.. py:method:: hook(self)
Create and return an EmrContainerHook.
.. py:method:: execute(self, context)
Run job on EMR Containers
.. py:method:: on_kill(self)
Cancel the submitted job run
.. py:class:: EmrClusterLink
Bases: :py:obj:`airflow.models.BaseOperatorLink`
Operator link for EmrCreateJobFlowOperator. It allows users to access the EMR Cluster
.. py:attribute:: name
:annotation: = EMR Cluster
.. py:method:: get_link(self, operator, dttm = None, ti_key = None)
Get link to EMR cluster.
:param operator: operator
:param dttm: datetime
:return: url link
.. py:class:: EmrCreateJobFlowOperator(*, aws_conn_id = 'aws_default', emr_conn_id = 'emr_default', job_flow_overrides = None, region_name = None, **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
Creates an EMR JobFlow, reading the config from the EMR connection.
A dictionary of JobFlow overrides can be passed that override
the config from the connection.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:EmrCreateJobFlowOperator`
:param aws_conn_id: aws connection to uses
:param emr_conn_id: emr connection to use
:param job_flow_overrides: boto3 style arguments or reference to an arguments file
(must be '.json') to override emr_connection extra. (templated)
:param region_name: Region named passed to EmrHook
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['job_flow_overrides']
.. py:attribute:: template_ext
:annotation: :Sequence[str] = ['.json']
.. py:attribute:: template_fields_renderers
.. py:attribute:: ui_color
:annotation: = #f9c915
.. py:attribute:: operator_extra_links
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
.. py:class:: EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id = 'aws_default', **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
An operator that modifies an existing EMR cluster.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:EmrModifyClusterOperator`
:param cluster_id: cluster identifier
:param step_concurrency_level: Concurrency of the cluster
:param aws_conn_id: aws connection to uses
:param do_xcom_push: if True, cluster_id is pushed to XCom with key cluster_id.
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['cluster_id', 'step_concurrency_level']
.. py:attribute:: template_ext
:annotation: :Sequence[str] = []
.. py:attribute:: ui_color
:annotation: = #f9c915
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
.. py:class:: EmrTerminateJobFlowOperator(*, job_flow_id, aws_conn_id = 'aws_default', **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
Operator to terminate EMR JobFlows.
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:EmrTerminateJobFlowOperator`
:param job_flow_id: id of the JobFlow to terminate. (templated)
:param aws_conn_id: aws connection to uses
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['job_flow_id']
.. py:attribute:: template_ext
:annotation: :Sequence[str] = []
.. py:attribute:: ui_color
:annotation: = #f9c915
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.