| :py:mod:`airflow.providers.apache.livy.operators.livy` |
| ====================================================== |
| |
| .. py:module:: airflow.providers.apache.livy.operators.livy |
| |
| .. autoapi-nested-parse:: |
| |
| This module contains the Apache Livy operator. |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.livy.operators.livy.LivyOperator |
| |
| |
| |
| |
| .. py:class:: LivyOperator(*, file, class_name = None, args = None, conf = None, jars = None, py_files = None, files = None, driver_memory = None, driver_cores = None, executor_memory = None, executor_cores = None, num_executors = None, archives = None, queue = None, name = None, proxy_user = None, livy_conn_id = 'livy_default', polling_interval = 0, extra_options = None, extra_headers = None, retry_args = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| This operator wraps the Apache Livy batch REST API, allowing to submit a Spark |
| application to the underlying cluster. |
| |
| :param file: path of the file containing the application to execute (required). |
| :param class_name: name of the application Java/Spark main class. |
| :param args: application command line arguments. |
| :param jars: jars to be used in this sessions. |
| :param py_files: python files to be used in this session. |
| :param files: files to be used in this session. |
| :param driver_memory: amount of memory to use for the driver process. |
| :param driver_cores: number of cores to use for the driver process. |
| :param executor_memory: amount of memory to use per executor process. |
| :param executor_cores: number of cores to use for each executor. |
| :param num_executors: number of executors to launch for this session. |
| :param archives: archives to be used in this session. |
| :param queue: name of the YARN queue to which the application is submitted. |
| :param name: name of this session. |
| :param conf: Spark configuration properties. |
| :param proxy_user: user to impersonate when running the job. |
| :param livy_conn_id: reference to a pre-defined Livy Connection. |
| :param polling_interval: time in seconds between polling for job completion. Don't poll for values >=0 |
| :param extra_options: A dictionary of options, where key is string and value |
| depends on the option that's being modified. |
| :param extra_headers: A dictionary of headers passed to the HTTP request to livy. |
| :param retry_args: Arguments which define the retry behaviour. |
| See Tenacity documentation at https://github.com/jd/tenacity |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['spark_params'] |
| |
| |
| |
| .. py:method:: get_hook(self) |
| |
| Get valid hook. |
| |
| :return: hook |
| :rtype: LivyHook |
| |
| |
| .. py:method:: execute(self, context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| .. py:method:: poll_for_termination(self, batch_id) |
| |
| Pool Livy for batch termination. |
| |
| :param batch_id: id of the batch session to monitor. |
| |
| |
| .. py:method:: on_kill(self) |
| |
| Override this method to cleanup subprocesses when a task instance |
| gets killed. Any use of the threading, subprocess or multiprocessing |
| module within an operator needs to be cleaned up or it will leave |
| ghost processes behind. |
| |
| |
| .. py:method:: kill(self) |
| |
| Delete the current batch session. |
| |
| |
| |