blob: 26872c17a4e9daf6177a52925b948f514b477fa5 [file] [log] [blame]
:py:mod:`airflow.providers.apache.hive.operators.hive_stats`
============================================================
.. py:module:: airflow.providers.apache.hive.operators.hive_stats
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.apache.hive.operators.hive_stats.HiveStatsCollectionOperator
.. py:class:: HiveStatsCollectionOperator(*, table, partition, extra_exprs = None, excluded_columns = None, assignment_func = None, metastore_conn_id = 'metastore_default', presto_conn_id = 'presto_default', mysql_conn_id = 'airflow_db', **kwargs)
Bases: :py:obj:`airflow.models.BaseOperator`
Gathers partition statistics using a dynamically generated Presto
query, inserts the stats into a MySql table with this format. Stats
overwrite themselves if you rerun the same date/partition. ::
CREATE TABLE hive_stats (
ds VARCHAR(16),
table_name VARCHAR(500),
metric VARCHAR(200),
value BIGINT
);
:param metastore_conn_id: Reference to the
:ref:`Hive Metastore connection id <howto/connection:hive_metastore>`.
:param table: the source table, in the format ``database.table_name``. (templated)
:param partition: the source partition. (templated)
:param extra_exprs: dict of expression to run against the table where
keys are metric names and values are Presto compatible expressions
:param excluded_columns: list of columns to exclude, consider
excluding blobs, large json columns, ...
:param assignment_func: a function that receives a column name and
a type, and returns a dict of metric names and an Presto expressions.
If None is returned, the global defaults are applied. If an
empty dictionary is returned, no stats are computed for that
column.
.. py:attribute:: template_fields
:annotation: :Sequence[str] = ['table', 'partition', 'ds', 'dttm']
.. py:attribute:: ui_color
:annotation: = #aff7a6
.. py:method:: get_default_exprs(self, col, col_type)
Get default expressions
.. py:method:: execute(self, context)
This is the main method to derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.