| :mod:`airflow.contrib.operators.bigquery_check_operator` |
| ======================================================== |
| |
| .. py:module:: airflow.contrib.operators.bigquery_check_operator |
| |
| |
| Module Contents |
| --------------- |
| |
| .. py:class:: BigQueryCheckOperator(sql, bigquery_conn_id='bigquery_default', use_legacy_sql=True, *args, **kwargs) |
| |
| Bases: :class:`airflow.operators.check_operator.CheckOperator` |
| |
| Performs checks against BigQuery. The ``BigQueryCheckOperator`` expects |
| a sql query that will return a single row. Each value on that |
| first row is evaluated using python ``bool`` casting. If any of the |
| values return ``False`` the check is failed and errors out. |
| |
| Note that Python bool casting evals the following as ``False``: |
| |
| * ``False`` |
| * ``0`` |
| * Empty string (``""``) |
| * Empty list (``[]``) |
| * Empty dictionary or set (``{}``) |
| |
| Given a query like ``SELECT COUNT(*) FROM foo``, it will fail only if |
| the count ``== 0``. You can craft much more complex query that could, |
| for instance, check that the table has the same number of rows as |
| the source table upstream, or that the count of today's partition is |
| greater than yesterday's partition, or that a set of metrics are less |
| than 3 standard deviation for the 7 day average. |
| |
| This operator can be used as a data quality check in your pipeline, and |
| depending on where you put it in your DAG, you have the choice to |
| stop the critical path, preventing from |
| publishing dubious data, or on the side and receive email alerts |
| without stopping the progress of the DAG. |
| |
| :param sql: the sql to be executed |
| :type sql: str |
| :param bigquery_conn_id: reference to the BigQuery database |
| :type bigquery_conn_id: str |
| :param use_legacy_sql: Whether to use legacy SQL (true) |
| or standard SQL (false). |
| :type use_legacy_sql: bool |
| |
| .. attribute:: template_fields |
| :annotation: = ['sql'] |
| |
| |
| |
| .. attribute:: template_ext |
| :annotation: = ['.sql'] |
| |
| |
| |
| |
| .. method:: get_db_hook(self) |
| |
| |
| |
| |
| .. py:class:: BigQueryValueCheckOperator(sql, pass_value, tolerance=None, bigquery_conn_id='bigquery_default', use_legacy_sql=True, *args, **kwargs) |
| |
| Bases: :class:`airflow.operators.check_operator.ValueCheckOperator` |
| |
| Performs a simple value check using sql code. |
| |
| :param sql: the sql to be executed |
| :type sql: str |
| :param use_legacy_sql: Whether to use legacy SQL (true) |
| or standard SQL (false). |
| :type use_legacy_sql: bool |
| |
| .. attribute:: template_fields |
| :annotation: = ['sql', 'pass_value'] |
| |
| |
| |
| .. attribute:: template_ext |
| :annotation: = ['.sql'] |
| |
| |
| |
| |
| .. method:: get_db_hook(self) |
| |
| |
| |
| |
| .. py:class:: BigQueryIntervalCheckOperator(table, metrics_thresholds, date_filter_column='ds', days_back=-7, bigquery_conn_id='bigquery_default', use_legacy_sql=True, *args, **kwargs) |
| |
| Bases: :class:`airflow.operators.check_operator.IntervalCheckOperator` |
| |
| Checks that the values of metrics given as SQL expressions are within |
| a certain tolerance of the ones from days_back before. |
| |
| This method constructs a query like so :: |
| |
| SELECT {metrics_threshold_dict_key} FROM {table} |
| WHERE {date_filter_column}=<date> |
| |
| :param table: the table name |
| :type table: str |
| :param days_back: number of days between ds and the ds we want to check |
| against. Defaults to 7 days |
| :type days_back: int |
| :param metrics_threshold: a dictionary of ratios indexed by metrics, for |
| example 'COUNT(*)': 1.5 would require a 50 percent or less difference |
| between the current day, and the prior days_back. |
| :type metrics_threshold: dict |
| :param use_legacy_sql: Whether to use legacy SQL (true) |
| or standard SQL (false). |
| :type use_legacy_sql: bool |
| |
| .. attribute:: template_fields |
| :annotation: = ['table'] |
| |
| |
| |
| |
| .. method:: get_db_hook(self) |
| |
| |
| |
| |