blob: d47af44d6cfc5247962be2fb5f0c5beff7bc25f2 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. _howto/operator:DatabricksSqlOperator:
DatabricksSqlOperator
=====================
Use the :class:`~airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator` to execute SQL
on a `Databricks SQL endpoint <https://docs.databricks.com/sql/admin/sql-endpoints.html>`_ or a
`Databricks cluster <https://docs.databricks.com/clusters/index.html>`_.
Using the Operator
------------------
Operator executes given SQL queries against configured endpoint. There are 3 ways of specifying SQL queries:
1. Simple string with SQL statement.
2. List of strings representing SQL statements.
3. Name of the file with SQL queries. File must have ``.sql`` extension. Each query should finish with ``;<new_line>``
.. list-table::
:widths: 15 25
:header-rows: 1
* - Parameter
- Input
* - sql: str or list[str]
- Required parameter specifying a queries to execute.
* - sql_endpoint_name: str
- Optional name of Databricks SQL endpoint to use. If not specified, ``http_path`` should be provided.
* - http_path: str
- Optional HTTP path for Databricks SQL endpoint or Databricks cluster. If not specified, it should be provided in Databricks connection, or the ``sql_endpoint_name`` parameter must be set.
* - parameters: dict[str, any]
- Optional parameters that will be used to substitute variable(s) in SQL query.
* - session_configuration: dict[str,str]
- optional dict specifying Spark configuration parameters that will be set for the session.
* - output_path: str
- Optional path to the file to which results will be written.
* - output_format: str
- Name of the format which will be used to write results. Supported values are (case-insensitive): ``JSON`` (array of JSON objects), ``JSONL`` (each row as JSON object on a separate line), ``CSV`` (default).
* - csv_params: dict[str, any]
- Optional dictionary with parameters to customize Python CSV writer.
* - do_xcom_push: boolean
- whether we should push query results (last query if multiple queries are provided) to xcom. Default: false
Examples
--------
Selecting data
^^^^^^^^^^^^^^
An example usage of the DatabricksSqlOperator to select data from a table is as follows:
.. exampleinclude:: /../../airflow/providers/databricks/example_dags/example_databricks_sql.py
:language: python
:start-after: [START howto_operator_databricks_sql_select]
:end-before: [END howto_operator_databricks_sql_select]
Selecting data into a file
^^^^^^^^^^^^^^^^^^^^^^^^^^
An example usage of the DatabricksSqlOperator to select data from a table and store in a file is as follows:
.. exampleinclude:: /../../airflow/providers/databricks/example_dags/example_databricks_sql.py
:language: python
:start-after: [START howto_operator_databricks_sql_select_file]
:end-before: [END howto_operator_databricks_sql_select_file]
Executing multiple statements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example usage of the DatabricksSqlOperator to perform multiple SQL statements is as follows:
.. exampleinclude:: /../../airflow/providers/databricks/example_dags/example_databricks_sql.py
:language: python
:start-after: [START howto_operator_databricks_sql_multiple]
:end-before: [END howto_operator_databricks_sql_multiple]
Executing multiple statements from a file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example usage of the DatabricksSqlOperator to perform statements from a file is as follows:
.. exampleinclude:: /../../airflow/providers/databricks/example_dags/example_databricks_sql.py
:language: python
:start-after: [START howto_operator_databricks_sql_multiple_file]
:end-before: [END howto_operator_databricks_sql_multiple_file]