| :mod:`airflow.operators.hive_stats_operator` |
| ============================================ |
| |
| .. py:module:: airflow.operators.hive_stats_operator |
| |
| |
| Module Contents |
| --------------- |
| |
| .. py:class:: HiveStatsCollectionOperator(table, partition, extra_exprs=None, excluded_columns=None, assignment_func=None, metastore_conn_id='metastore_default', presto_conn_id='presto_default', mysql_conn_id='airflow_db', *args, **kwargs) |
| |
| Bases: :class:`airflow.models.BaseOperator` |
| |
| Gathers partition statistics using a dynamically generated Presto |
| query, inserts the stats into a MySql table with this format. Stats |
| overwrite themselves if you rerun the same date/partition. :: |
| |
| CREATE TABLE hive_stats ( |
| ds VARCHAR(16), |
| table_name VARCHAR(500), |
| metric VARCHAR(200), |
| value BIGINT |
| ); |
| |
| :param table: the source table, in the format ``database.table_name``. (templated) |
| :type table: str |
| :param partition: the source partition. (templated) |
| :type partition: dict of {col:value} |
| :param extra_exprs: dict of expression to run against the table where |
| keys are metric names and values are Presto compatible expressions |
| :type extra_exprs: dict |
| :param excluded_columns: list of columns to exclude, consider |
| excluding blobs, large json columns, ... |
| :type excluded_columns: list |
| :param assignment_func: a function that receives a column name and |
| a type, and returns a dict of metric names and an Presto expressions. |
| If None is returned, the global defaults are applied. If an |
| empty dictionary is returned, no stats are computed for that |
| column. |
| :type assignment_func: function |
| |
| .. attribute:: template_fields |
| :annotation: = ['table', 'partition', 'ds', 'dttm'] |
| |
| |
| |
| .. attribute:: ui_color |
| :annotation: = #aff7a6 |
| |
| |
| |
| |
| .. method:: get_default_exprs(self, col, col_type) |
| |
| |
| |
| |
| .. method:: execute(self, context=None) |
| |
| |
| |
| |