docs-archive/apache-airflow-providers-amazon/3.4.0/_sources/operators/emr_eks.rst.txt - airflow-site - Git at Google

  .. Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

  ..   http://www.apache.org/licenses/LICENSE-2.0

  .. Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.


 .. _howto/operator:EMRContainersOperators:

 Amazon EMR on EKS Operators
 ===========================

 `Amazon EMR on EKS <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html>`__
 provides a deployment option for Amazon EMR that allows you to run open-source big data frameworks on
 Amazon EKS.

 Airflow provides the :class:`~airflow.providers.amazon.aws.operators.emr.EmrContainerOperator`
 to submit Apache Spark jobs to your EMR on EKS virtual cluster.

 Prerequisite Tasks
 ------------------

 .. include:: _partials/prerequisite_tasks.rst

 This example assumes that you already have an EMR on EKS virtual cluster configured. See the
 `EMR on EKS Getting Started guide <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/getting-started.html>`__
 for more information.


 Run a Spark job on EMR on EKS
 -----------------------------

 Purpose
 """""""

 The ``EMRContainerOperator`` will submit a new job to an EMR on EKS virtual cluster and wait for
 the job to complete. The example job below calculates the mathematical constant ``Pi``. In a
 production job, you would usually refer to a Spark script on Amazon Simple Storage Service (S3).

 Job configuration
 """""""""""""""""

 To create a job for EMR on EKS, you need to specify your virtual cluster ID, the release of EMR you
 want to use, your IAM execution role, and Spark submit parameters.

 You can also optionally provide configuration overrides such as Spark, Hive, or Log4j properties as
 well as monitoring configuration that sends Spark logs to S3 or Amazon Cloudwatch.

 In the example, we show how to add an ``applicationConfiguration`` to use the AWS Glue data catalog
 and ``monitoringConfiguration`` to send logs to the ``/aws/emr-eks-spark`` log group in CloudWatch.
 Refer to the `EMR on EKS guide <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-parameters>`__
 for more details on job configuration.

 .. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_eks_job.py
     :language: python
     :start-after: [START howto_operator_emr_eks_config]
     :end-before: [END howto_operator_emr_eks_config]


 We pass the ``virtual_cluster_id`` and ``execution_role_arn`` values as operator parameters, but you
 can store them in a connection or provide them in the DAG. Your AWS region should be defined either
 in the ``aws_default`` connection as ``{"region_name": "us-east-1"}`` or a custom connection name
 that gets passed to the operator with the ``aws_conn_id`` parameter.

 .. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_eks_job.py
     :language: python
     :dedent: 4
     :start-after: [START howto_operator_emr_eks_job]
     :end-before: [END howto_operator_emr_eks_job]

 With the EmrContainerOperator, it will wait until the successful completion of the job or raise
 an ``AirflowException`` if there is an error. The operator returns the Job ID of the job run.

 Reference
 ---------

 For further information, look at:

 * `Amazon EMR on EKS Job runs <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs.html>`__
 * `EMR on EKS Best Practices <https://aws.github.io/aws-emr-containers-best-practices/>`__
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.


	.. _howto/operator:EMRContainersOperators:

	Amazon EMR on EKS Operators
	===========================

	`Amazon EMR on EKS <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html>`__
	provides a deployment option for Amazon EMR that allows you to run open-source big data frameworks on
	Amazon EKS.

	Airflow provides the :class:`~airflow.providers.amazon.aws.operators.emr.EmrContainerOperator`
	to submit Apache Spark jobs to your EMR on EKS virtual cluster.

	Prerequisite Tasks
	------------------

	.. include:: _partials/prerequisite_tasks.rst

	This example assumes that you already have an EMR on EKS virtual cluster configured. See the
	`EMR on EKS Getting Started guide <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/getting-started.html>`__
	for more information.


	Run a Spark job on EMR on EKS
	-----------------------------

	Purpose
	"""""""

	The ``EMRContainerOperator`` will submit a new job to an EMR on EKS virtual cluster and wait for
	the job to complete. The example job below calculates the mathematical constant ``Pi``. In a
	production job, you would usually refer to a Spark script on Amazon Simple Storage Service (S3).

	Job configuration
	"""""""""""""""""

	To create a job for EMR on EKS, you need to specify your virtual cluster ID, the release of EMR you
	want to use, your IAM execution role, and Spark submit parameters.

	You can also optionally provide configuration overrides such as Spark, Hive, or Log4j properties as
	well as monitoring configuration that sends Spark logs to S3 or Amazon Cloudwatch.

	In the example, we show how to add an ``applicationConfiguration`` to use the AWS Glue data catalog
	and ``monitoringConfiguration`` to send logs to the ``/aws/emr-eks-spark`` log group in CloudWatch.
	Refer to the `EMR on EKS guide <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-parameters>`__
	for more details on job configuration.

	.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_eks_job.py
	:language: python
	:start-after: [START howto_operator_emr_eks_config]
	:end-before: [END howto_operator_emr_eks_config]


	We pass the ``virtual_cluster_id`` and ``execution_role_arn`` values as operator parameters, but you
	can store them in a connection or provide them in the DAG. Your AWS region should be defined either
	in the ``aws_default`` connection as ``{"region_name": "us-east-1"}`` or a custom connection name
	that gets passed to the operator with the ``aws_conn_id`` parameter.

	.. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_emr_eks_job.py
	:language: python
	:dedent: 4
	:start-after: [START howto_operator_emr_eks_job]
	:end-before: [END howto_operator_emr_eks_job]

	With the EmrContainerOperator, it will wait until the successful completion of the job or raise
	an ``AirflowException`` if there is an error. The operator returns the Job ID of the job run.

	Reference
	---------

	For further information, look at:

	* `Amazon EMR on EKS Job runs <https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs.html>`__
	* `EMR on EKS Best Practices <https://aws.github.io/aws-emr-containers-best-practices/>`__