blob: 6ec278547bddf018d2ef86e9d993588f1e185b30 [file] [log] [blame]
:py:mod:`airflow.providers.apache.livy.operators.livy`
======================================================
.. py:module:: airflow.providers.apache.livy.operators.livy
.. autoapi-nested-parse::
This module contains the Apache Livy operator.
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.apache.livy.operators.livy.LivyOperator
.. py:class:: LivyOperator(*, file, class_name = None, args = None, conf = None, jars = None, py_files = None, files = None, driver_memory = None, driver_cores = None, executor_memory = None, executor_cores = None, num_executors = None, archives = None, queue = None, name = None, proxy_user = None, livy_conn_id = 'livy_default', polling_interval = 0, extra_options = None, extra_headers = None, retry_args = None, **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
This operator wraps the Apache Livy batch REST API, allowing to submit a Spark
application to the underlying cluster.
:param file: path of the file containing the application to execute (required).
:param class_name: name of the application Java/Spark main class.
:param args: application command line arguments.
:param jars: jars to be used in this sessions.
:param py_files: python files to be used in this session.
:param files: files to be used in this session.
:param driver_memory: amount of memory to use for the driver process.
:param driver_cores: number of cores to use for the driver process.
:param executor_memory: amount of memory to use per executor process.
:param executor_cores: number of cores to use for each executor.
:param num_executors: number of executors to launch for this session.
:param archives: archives to be used in this session.
:param queue: name of the YARN queue to which the application is submitted.
:param name: name of this session.
:param conf: Spark configuration properties.
:param proxy_user: user to impersonate when running the job.
:param livy_conn_id: reference to a pre-defined Livy Connection.
:param polling_interval: time in seconds between polling for job completion. Don't poll for values >=0
:param extra_options: A dictionary of options, where key is string and value
depends on the option that's being modified.
:param extra_headers: A dictionary of headers passed to the HTTP request to livy.
:param retry_args: Arguments which define the retry behaviour.
See Tenacity documentation at https://github.com/jd/tenacity
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['spark_params']
.. py:method:: get_hook(self)
Get valid hook.
:return: hook
:rtype: LivyHook
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
.. py:method:: poll_for_termination(self, batch_id)
Pool Livy for batch termination.
:param batch_id: id of the batch session to monitor.
.. py:method:: on_kill(self)
Override this method to cleanup subprocesses when a task instance
gets killed. Any use of the threading, subprocess or multiprocessing
module within an operator needs to be cleaned up or it will leave
ghost processes behind.
.. py:method:: kill(self)
Delete the current batch session.