| :py:mod:`airflow.providers.databricks.operators.databricks_sql` |
| =============================================================== |
| |
| .. py:module:: airflow.providers.databricks.operators.databricks_sql |
| |
| .. autoapi-nested-parse:: |
| |
| This module contains Databricks operators. |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator |
| airflow.providers.databricks.operators.databricks_sql.DatabricksCopyIntoOperator |
| |
| |
| |
| |
| Attributes |
| ~~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.databricks.operators.databricks_sql.COPY_INTO_APPROVED_FORMATS |
| |
| |
| .. py:class:: DatabricksSqlOperator(*, sql, databricks_conn_id = DatabricksSqlHook.default_conn_name, http_path = None, sql_endpoint_name = None, parameters = None, session_configuration=None, http_headers = None, catalog = None, schema = None, do_xcom_push = False, output_path = None, output_format = 'csv', csv_params = None, client_parameters = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| Executes SQL code in a Databricks SQL endpoint or a Databricks cluster |
| |
| .. seealso:: |
| For more information on how to use this operator, take a look at the guide: |
| :ref:`howto/operator:DatabricksSqlOperator` |
| |
| :param databricks_conn_id: Reference to |
| :ref:`Databricks connection id<howto/connection:databricks>` |
| :param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. |
| If not specified, it should be either specified in the Databricks connection's extra parameters, |
| or ``sql_endpoint_name`` must be specified. |
| :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not specified, ``http_path`` must |
| be provided as described above. |
| :param sql: the SQL code to be executed as a single string, or |
| a list of str (sql statements), or a reference to a template file. (templated) |
| Template references are recognized by str ending in '.sql' |
| :param parameters: (optional) the parameters to render the SQL query with. |
| :param session_configuration: An optional dictionary of Spark session parameters. Defaults to None. |
| If not specified, it could be specified in the Databricks connection's extra parameters. |
| :param client_parameters: Additional parameters internal to Databricks SQL Connector parameters |
| :param http_headers: An optional list of (k, v) pairs that will be set as HTTP headers on every request. |
| (templated) |
| :param catalog: An optional initial catalog to use. Requires DBR version 9.0+ (templated) |
| :param schema: An optional initial schema to use. Requires DBR version 9.0+ (templated) |
| :param output_path: optional string specifying the file to which write selected data. (templated) |
| :param output_format: format of output data if ``output_path` is specified. |
| Possible values are ``csv``, ``json``, ``jsonl``. Default is ``csv``. |
| :param csv_params: parameters that will be passed to the ``csv.DictWriter`` class used to write CSV data. |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql', '_output_path', 'schema', 'catalog', 'http_headers'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:method:: execute(self, context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:data:: COPY_INTO_APPROVED_FORMATS |
| :annotation: = ['CSV', 'JSON', 'AVRO', 'ORC', 'PARQUET', 'TEXT', 'BINARYFILE'] |
| |
| |
| |
| .. py:class:: DatabricksCopyIntoOperator(*, table_name, file_location, file_format, databricks_conn_id = DatabricksSqlHook.default_conn_name, http_path = None, sql_endpoint_name = None, session_configuration=None, http_headers = None, client_parameters = None, catalog = None, schema = None, files = None, pattern = None, expression_list = None, credential = None, storage_credential = None, encryption = None, format_options = None, force_copy = None, copy_options = None, validate = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. |
| COPY INTO command is constructed from individual pieces, that are described in |
| `documentation <https://docs.databricks.com/sql/language-manual/delta-copy-into.html>`_. |
| |
| .. seealso:: |
| For more information on how to use this operator, take a look at the guide: |
| :ref:`howto/operator:DatabricksSqlCopyIntoOperator` |
| |
| :param table_name: Required name of the table. (templated) |
| :param file_location: Required location of files to import. (templated) |
| :param file_format: Required file format. Supported formats are |
| ``CSV``, ``JSON``, ``AVRO``, ``ORC``, ``PARQUET``, ``TEXT``, ``BINARYFILE``. |
| :param databricks_conn_id: Reference to |
| :ref:`Databricks connection id<howto/connection:databricks>` |
| :param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. |
| If not specified, it should be either specified in the Databricks connection's extra parameters, |
| or ``sql_endpoint_name`` must be specified. |
| :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. |
| If not specified, ``http_path`` must be provided as described above. |
| :param session_configuration: An optional dictionary of Spark session parameters. Defaults to None. |
| If not specified, it could be specified in the Databricks connection's extra parameters. |
| :param http_headers: An optional list of (k, v) pairs that will be set as HTTP headers on every request |
| :param catalog: An optional initial catalog to use. Requires DBR version 9.0+ |
| :param schema: An optional initial schema to use. Requires DBR version 9.0+ |
| :param client_parameters: Additional parameters internal to Databricks SQL Connector parameters |
| :param files: optional list of files to import. Can't be specified together with ``pattern``. (templated) |
| :param pattern: optional regex string to match file names to import. |
| Can't be specified together with ``files``. |
| :param expression_list: optional string that will be used in the ``SELECT`` expression. |
| :param credential: optional credential configuration for authentication against a source location. |
| :param storage_credential: optional Unity Catalog storage credential for destination. |
| :param encryption: optional encryption configuration for a specified location. |
| :param format_options: optional dictionary with options specific for a given file format. |
| :param force_copy: optional bool to control forcing of data import |
| (could be also specified in ``copy_options``). |
| :param validate: optional configuration for schema & data validation. ``True`` forces validation |
| of all rows, integer number - validate only N first rows |
| :param copy_options: optional dictionary of copy options. Right now only ``force`` option is supported. |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['_file_location', '_files', '_table_name'] |
| |
| |
| |
| .. py:method:: execute(self, context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |