blob: 5d943785250300cd34282ddd50dd6d02d380e362 [file] [log] [blame]
:py:mod:`airflow.providers.apache.spark.operators.spark_sql`
============================================================
.. py:module:: airflow.providers.apache.spark.operators.spark_sql
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.apache.spark.operators.spark_sql.SparkSqlOperator
.. py:class:: SparkSqlOperator(*, sql, conf = None, conn_id = 'spark_sql_default', total_executor_cores = None, executor_cores = None, executor_memory = None, keytab = None, principal = None, master = None, name = 'default-name', num_executors = None, verbose = True, yarn_queue = None, **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
Execute Spark SQL query
.. seealso::
For more information on how to use this operator, take a look at the guide:
:ref:`howto/operator:SparkSqlOperator`
:param sql: The SQL query to execute. (templated)
:param conf: arbitrary Spark configuration property
:param conn_id: connection_id string
:param total_executor_cores: (Standalone & Mesos only) Total cores for all
executors (Default: all the available cores on the worker)
:param executor_cores: (Standalone & YARN only) Number of cores per
executor (Default: 2)
:param executor_memory: Memory per executor (e.g. 1000M, 2G) (Default: 1G)
:param keytab: Full path to the file that contains the keytab
:param master: spark://host:port, mesos://host:port, yarn, or local
(Default: The ``host`` and ``port`` set in the Connection, or ``"yarn"``)
:param name: Name of the job
:param num_executors: Number of executors to launch
:param verbose: Whether to pass the verbose flag to spark-sql
:param yarn_queue: The YARN queue to submit to
(Default: The ``queue`` value set in the Connection, or ``"default"``)
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['_sql']
.. py:attribute:: template_ext
:annotation: :Sequence[str] = ['.sql', '.hql']
.. py:attribute:: template_fields_renderers
.. py:method:: execute(self, context)
Call the SparkSqlHook to run the provided sql query
.. py:method:: on_kill(self)
Override this method to cleanup subprocesses when a task instance
gets killed. Any use of the threading, subprocess or multiprocessing
module within an operator needs to be cleaned up or it will leave
ghost processes behind.