| :py:mod:`airflow.providers.common.sql.operators.sql` |
| ==================================================== |
| |
| .. py:module:: airflow.providers.common.sql.operators.sql |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.common.sql.operators.sql.BaseSQLOperator |
| airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator |
| airflow.providers.common.sql.operators.sql.SQLTableCheckOperator |
| airflow.providers.common.sql.operators.sql.SQLCheckOperator |
| airflow.providers.common.sql.operators.sql.SQLValueCheckOperator |
| airflow.providers.common.sql.operators.sql.SQLIntervalCheckOperator |
| airflow.providers.common.sql.operators.sql.SQLThresholdCheckOperator |
| airflow.providers.common.sql.operators.sql.BranchSQLOperator |
| |
| |
| |
| Functions |
| ~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.common.sql.operators.sql.parse_boolean |
| |
| |
| |
| .. py:function:: parse_boolean(val) |
| |
| Try to parse a string into boolean. |
| |
| Raises ValueError if the input is not a valid true- or false-like string value. |
| |
| |
| .. py:class:: BaseSQLOperator(*, conn_id = None, database = None, hook_params = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| This is a base class for generic SQL Operator to get a DB Hook |
| |
| The provided method is .get_db_hook(). The default behavior will try to |
| retrieve the DB hook based on connection type. |
| You can custom the behavior by overriding the .get_db_hook() method. |
| |
| .. py:method:: get_db_hook() |
| |
| Get the database hook for the connection. |
| |
| :return: the database hook object. |
| :rtype: DbApiHook |
| |
| |
| |
| .. py:class:: SQLColumnCheckOperator(*, table, column_mapping, partition_clause = None, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Performs one or more of the templated checks in the column_checks dictionary. |
| Checks are performed on a per-column basis specified by the column_mapping. |
| Each check can take one or more of the following options: |
| - equal_to: an exact value to equal, cannot be used with other comparison options |
| - greater_than: value that result should be strictly greater than |
| - less_than: value that results should be strictly less than |
| - geq_to: value that results should be greater than or equal to |
| - leq_to: value that results should be less than or equal to |
| - tolerance: the percentage that the result may be off from the expected value |
| |
| :param table: the table to run checks on |
| :param column_mapping: the dictionary of columns and their associated checks, e.g. |
| |
| .. code-block:: python |
| |
| { |
| "col_name": { |
| "null_check": { |
| "equal_to": 0, |
| }, |
| "min": { |
| "greater_than": 5, |
| "leq_to": 10, |
| "tolerance": 0.2, |
| }, |
| "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01}, |
| } |
| } |
| |
| :param partition_clause: a partial SQL statement that is added to a WHERE clause in the query built by |
| the operator that creates partition_clauses for the checks to run on, e.g. |
| |
| .. code-block:: python |
| |
| "date = '1970-01-01'" |
| |
| :param conn_id: the connection ID used to connect to the database |
| :param database: name of database which overwrite the defined one in connection |
| |
| .. seealso:: |
| For more information on how to use this operator, take a look at the guide: |
| :ref:`howto/operator:SQLColumnCheckOperator` |
| |
| .. py:attribute:: template_fields |
| :annotation: = ['partition_clause'] |
| |
| |
| |
| .. py:attribute:: column_checks |
| |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:class:: SQLTableCheckOperator(*, table, checks, partition_clause = None, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Performs one or more of the checks provided in the checks dictionary. |
| Checks should be written to return a boolean result. |
| |
| :param table: the table to run checks on |
| :param checks: the dictionary of checks, e.g.: |
| |
| .. code-block:: python |
| |
| { |
| "row_count_check": {"check_statement": "COUNT(*) = 1000"}, |
| "column_sum_check": {"check_statement": "col_a + col_b < col_c"}, |
| } |
| |
| |
| :param partition_clause: a partial SQL statement that is added to a WHERE clause in the query built by |
| the operator that creates partition_clauses for the checks to run on, e.g. |
| |
| .. code-block:: python |
| |
| "date = '1970-01-01'" |
| |
| :param conn_id: the connection ID used to connect to the database |
| :param database: name of database which overwrite the defined one in connection |
| |
| .. seealso:: |
| For more information on how to use this operator, take a look at the guide: |
| :ref:`howto/operator:SQLTableCheckOperator` |
| |
| .. py:attribute:: template_fields |
| :annotation: = ['partition_clause'] |
| |
| |
| |
| .. py:attribute:: sql_check_template |
| :annotation: = Multiline-String |
| |
| .. raw:: html |
| |
| <details><summary>Show Value</summary> |
| |
| .. code-block:: text |
| :linenos: |
| |
| |
| SELECT '_check_name' AS check_name, MIN(_check_name) AS check_result |
| FROM(SELECT CASE WHEN check_statement THEN 1 ELSE 0 END AS _check_name FROM table) |
| |
| |
| .. raw:: html |
| |
| </details> |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:class:: SQLCheckOperator(*, sql, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Performs checks against a db. The ``SQLCheckOperator`` expects |
| a sql query that will return a single row. Each value on that |
| first row is evaluated using python ``bool`` casting. If any of the |
| values return ``False`` the check is failed and errors out. |
| |
| Note that Python bool casting evals the following as ``False``: |
| |
| * ``False`` |
| * ``0`` |
| * Empty string (``""``) |
| * Empty list (``[]``) |
| * Empty dictionary or set (``{}``) |
| |
| Given a query like ``SELECT COUNT(*) FROM foo``, it will fail only if |
| the count ``== 0``. You can craft much more complex query that could, |
| for instance, check that the table has the same number of rows as |
| the source table upstream, or that the count of today's partition is |
| greater than yesterday's partition, or that a set of metrics are less |
| than 3 standard deviation for the 7 day average. |
| |
| This operator can be used as a data quality check in your pipeline, and |
| depending on where you put it in your DAG, you have the choice to |
| stop the critical path, preventing from |
| publishing dubious data, or on the side and receive email alerts |
| without stopping the progress of the DAG. |
| |
| :param sql: the sql to be executed. (templated) |
| :param conn_id: the connection ID used to connect to the database. |
| :param database: name of database which overwrite the defined one in connection |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.hql', '.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:attribute:: ui_color |
| :annotation: = #fff7e6 |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:class:: SQLValueCheckOperator(*, sql, pass_value, tolerance = None, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Performs a simple value check using sql code. |
| |
| :param sql: the sql to be executed. (templated) |
| :param conn_id: the connection ID used to connect to the database. |
| :param database: name of database which overwrite the defined one in connection |
| |
| .. py:attribute:: __mapper_args__ |
| |
| |
| |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql', 'pass_value'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.hql', '.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:attribute:: ui_color |
| :annotation: = #fff7e6 |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:class:: SQLIntervalCheckOperator(*, table, metrics_thresholds, date_filter_column = 'ds', days_back = -7, ratio_formula = 'max_over_min', ignore_zero = True, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Checks that the values of metrics given as SQL expressions are within |
| a certain tolerance of the ones from days_back before. |
| |
| :param table: the table name |
| :param conn_id: the connection ID used to connect to the database. |
| :param database: name of database which will overwrite the defined one in connection |
| :param days_back: number of days between ds and the ds we want to check |
| against. Defaults to 7 days |
| :param date_filter_column: The column name for the dates to filter on. Defaults to 'ds' |
| :param ratio_formula: which formula to use to compute the ratio between |
| the two metrics. Assuming cur is the metric of today and ref is |
| the metric to today - days_back. |
| |
| max_over_min: computes max(cur, ref) / min(cur, ref) |
| relative_diff: computes abs(cur-ref) / ref |
| |
| Default: 'max_over_min' |
| :param ignore_zero: whether we should ignore zero metrics |
| :param metrics_thresholds: a dictionary of ratios indexed by metrics |
| |
| .. py:attribute:: __mapper_args__ |
| |
| |
| |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql1', 'sql2'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.hql', '.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:attribute:: ui_color |
| :annotation: = #fff7e6 |
| |
| |
| |
| .. py:attribute:: ratio_formulas |
| |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |
| .. py:class:: SQLThresholdCheckOperator(*, sql, min_threshold, max_threshold, conn_id = None, database = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator` |
| |
| Performs a value check using sql code against a minimum threshold |
| and a maximum threshold. Thresholds can be in the form of a numeric |
| value OR a sql statement that results a numeric. |
| |
| :param sql: the sql to be executed. (templated) |
| :param conn_id: the connection ID used to connect to the database. |
| :param database: name of database which overwrite the defined one in connection |
| :param min_threshold: numerical value or min threshold sql to be executed (templated) |
| :param max_threshold: numerical value or max threshold sql to be executed (templated) |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql', 'min_threshold', 'max_threshold'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.hql', '.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| .. py:method:: push(meta_data) |
| |
| Optional: Send data check info and metadata to an external database. |
| Default functionality will log metadata. |
| |
| |
| |
| .. py:class:: BranchSQLOperator(*, sql, follow_task_ids_if_true, follow_task_ids_if_false, conn_id = 'default_conn_id', database = None, parameters = None, **kwargs) |
| |
| Bases: :py:obj:`BaseSQLOperator`, :py:obj:`airflow.models.SkipMixin` |
| |
| Allows a DAG to "branch" or follow a specified path based on the results of a SQL query. |
| |
| :param sql: The SQL code to be executed, should return true or false (templated) |
| Template reference are recognized by str ending in '.sql'. |
| Expected SQL query to return Boolean (True/False), integer (0 = False, Otherwise = 1) |
| or string (true/y/yes/1/on/false/n/no/0/off). |
| :param follow_task_ids_if_true: task id or task ids to follow if query returns true |
| :param follow_task_ids_if_false: task id or task ids to follow if query returns false |
| :param conn_id: the connection ID used to connect to the database. |
| :param database: name of database which overwrite the defined one in connection |
| :param parameters: (optional) the parameters to render the SQL query with. |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['sql'] |
| |
| |
| |
| .. py:attribute:: template_ext |
| :annotation: :Sequence[str] = ['.sql'] |
| |
| |
| |
| .. py:attribute:: template_fields_renderers |
| |
| |
| |
| |
| .. py:attribute:: ui_color |
| :annotation: = #a22034 |
| |
| |
| |
| .. py:attribute:: ui_fgcolor |
| :annotation: = #F7F7F7 |
| |
| |
| |
| .. py:method:: execute(context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |