| :py:mod:`airflow.providers.apache.hive.operators.hive_stats` |
| ============================================================ |
| |
| .. py:module:: airflow.providers.apache.hive.operators.hive_stats |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.hive.operators.hive_stats.HiveStatsCollectionOperator |
| |
| |
| |
| |
| .. py:class:: HiveStatsCollectionOperator(*, table, partition, extra_exprs = None, excluded_columns = None, assignment_func = None, metastore_conn_id = 'metastore_default', presto_conn_id = 'presto_default', mysql_conn_id = 'airflow_db', **kwargs) |
| |
| Bases: :py:obj:`airflow.models.BaseOperator` |
| |
| Gathers partition statistics using a dynamically generated Presto |
| query, inserts the stats into a MySql table with this format. Stats |
| overwrite themselves if you rerun the same date/partition. :: |
| |
| CREATE TABLE hive_stats ( |
| ds VARCHAR(16), |
| table_name VARCHAR(500), |
| metric VARCHAR(200), |
| value BIGINT |
| ); |
| |
| :param metastore_conn_id: Reference to the |
| :ref:`Hive Metastore connection id <howto/connection:hive_metastore>`. |
| :param table: the source table, in the format ``database.table_name``. (templated) |
| :param partition: the source partition. (templated) |
| :param extra_exprs: dict of expression to run against the table where |
| keys are metric names and values are Presto compatible expressions |
| :param excluded_columns: list of columns to exclude, consider |
| excluding blobs, large json columns, ... |
| :param assignment_func: a function that receives a column name and |
| a type, and returns a dict of metric names and an Presto expressions. |
| If None is returned, the global defaults are applied. If an |
| empty dictionary is returned, no stats are computed for that |
| column. |
| |
| .. py:attribute:: template_fields |
| :annotation: :Sequence[str] = ['table', 'partition', 'ds', 'dttm'] |
| |
| |
| |
| .. py:attribute:: ui_color |
| :annotation: = #aff7a6 |
| |
| |
| |
| .. py:method:: get_default_exprs(self, col, col_type) |
| |
| Get default expressions |
| |
| |
| .. py:method:: execute(self, context) |
| |
| This is the main method to derive when creating an operator. |
| Context is the same dictionary used as when rendering jinja templates. |
| |
| Refer to get_template_context for more context. |
| |
| |
| |