 .. Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

 ..   http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.


Amazon EMR Operators
====================

`Amazon EMR <https://aws.amazon.com/emr/>`__ (previously called Amazon Elastic MapReduce)
is a managed cluster platform that simplifies running big data frameworks, such as Apache
Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these
frameworks and related open-source projects, you can process data for analytics purposes
and business intelligence workloads.  Amazon EMR also lets you transform and move large
amounts of data into and out of other AWS data stores and databases, such as Amazon Simple
Storage Service (Amazon S3) and Amazon DynamoDB.

Prerequisite Tasks
------------------

.. include:: _partials/prerequisite_tasks.rst

.. note::
    In order to run the two examples successfully, you need to create the IAM Service
    Roles(``EMR_EC2_DefaultRole`` and ``EMR_DefaultRole``) for Amazon EMR.  You can
    create these roles using the AWS CLI: ``aws emr create-default-roles``.

.. _howto/operator:EmrCreateJobFlowOperator:

Create EMR Job Flow
-------------------

You can use :class:`~airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator` to
create a new EMR job flow.  The cluster will be terminated automatically after finishing the steps.

JobFlow configuration
"""""""""""""""""""""

To create a job flow on EMR, you need to specify the configuration for the EMR cluster:

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_automatic_steps.py
    :language: python
    :start-after: [START howto_operator_emr_automatic_steps_config]
    :end-before: [END howto_operator_emr_automatic_steps_config]

Here we create an EMR single-node Cluster *PiCalc*. It only has a single step *calculate_pi* which
calculates the value of ``Pi`` using Spark.  The config ``'KeepJobFlowAliveWhenNoSteps': False``
tells the cluster to shut down after the step is finished.  Alternatively, a config without a ``Steps``
value can be used and Steps can be added at a later date using
:class:`~airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator`.  See details below.

.. note::
    EMR clusters launched with the EMR API like this one are not visible to all users by default, so
    you may not see the cluster in the EMR Management Console - you can change this by adding
    ``'VisibleToAllUsers': True`` at the end of the ``JOB_FLOW_OVERRIDES`` dict.

For more config information, please refer to `Boto3 EMR client <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.run_job_flow>`__.

Create the Job Flow
"""""""""""""""""""

In the following code we are creating a new job flow using the configuration as explained above.

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_automatic_steps.py
    :language: python
    :dedent: 4
    :start-after: [START howto_operator_emr_create_job_flow]
    :end-before: [END howto_operator_emr_create_job_flow]

.. _howto/operator:EmrAddStepsOperator:

Add Steps to an EMR Job Flow
----------------------------

To add Steps to an existing EMR Job Flow you can use
:class:`~airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator`.

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_manual_steps.py
    :language: python
    :dedent: 4
    :start-after: [START howto_operator_emr_add_steps]
    :end-before: [END howto_operator_emr_add_steps]

.. _howto/operator:EmrTerminateJobFlowOperator:

Terminate an EMR Job Flow
-------------------------

To terminate an EMR Job Flow you can use
:class:`~airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator`.

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_manual_steps.py
    :language: python
    :dedent: 4
    :start-after: [START howto_operator_emr_terminate_job_flow]
    :end-before: [END howto_operator_emr_terminate_job_flow]

.. _howto/operator:EmrModifyClusterOperator:

Modify Amazon EMR Container
---------------------------

To modify an existing EMR Container you can use
:class:`~airflow.providers.amazon.aws.sensors.emr.EmrContainerSensor`.

.. _howto/sensor:EmrContainerSensor:

Amazon EMR Container Sensor
---------------------------

To monitor the state of an EMR Container you can use
:class:`~airflow.providers.amazon.aws.sensors.emr.EmrContainerSensor`.


.. _howto/sensor:EmrJobFlowSensor:

Amazon EMR Job Flow Sensor
---------------------------

To monitor the state of an EMR Job Flow you can use
:class:`~airflow.providers.amazon.aws.sensors.emr.EmrJobFlowSensor`.

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_automatic_steps.py
    :language: python
    :dedent: 4
    :start-after: [START howto_sensor_emr_job_flow_sensor]
    :end-before: [END howto_sensor_emr_job_flow_sensor]

.. _howto/sensor:EmrStepSensor:

Amazon EMR Step Sensor
----------------------

To monitor the state of a Step running an existing EMR Job Flow you can use
:class:`~airflow.providers.amazon.aws.sensors.emr.EmrStepSensor`.

.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_job_flow_manual_steps.py
    :language: python
    :dedent: 4
    :start-after: [START howto_sensor_emr_step_sensor]
    :end-before: [END howto_sensor_emr_step_sensor]

Reference
---------

For further information, look at:

* `Boto3 Library Documentation for EMR <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html>`__
* `AWS CLI - create-default-roles <https://docs.aws.amazon.com/cli/latest/reference/emr/create-default-roles.html>`__
* `Configure IAM Service Roles for Amazon EMR Permissions <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html>`__
