blob: 2f5563038211a20acbc52809ff8cf5b709550680 [file] [log] [blame]
:mod:`airflow.operators.hive_to_druid`
======================================
.. py:module:: airflow.operators.hive_to_druid
Module Contents
---------------
.. data:: LOAD_CHECK_INTERVAL
:annotation: = 5
.. data:: DEFAULT_TARGET_PARTITION_SIZE
:annotation: = 5000000
.. py:class:: HiveToDruidTransfer(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=-1, target_partition_size=-1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, *args, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Moves data from Hive to Druid, [del]note that for now the data is loaded
into memory before being pushed to Druid, so this operator should
be used for smallish amount of data.[/del]
:param sql: SQL query to execute against the Druid database. (templated)
:type sql: str
:param druid_datasource: the datasource you want to ingest into in druid
:type druid_datasource: str
:param ts_dim: the timestamp dimension
:type ts_dim: str
:param metric_spec: the metrics you want to define for your data
:type metric_spec: list
:param hive_cli_conn_id: the hive connection id
:type hive_cli_conn_id: str
:param druid_ingest_conn_id: the druid ingest connection id
:type druid_ingest_conn_id: str
:param metastore_conn_id: the metastore connection id
:type metastore_conn_id: str
:param hadoop_dependency_coordinates: list of coordinates to squeeze
int the ingest json
:type hadoop_dependency_coordinates: list[str]
:param intervals: list of time intervals that defines segments,
this is passed as is to the json object. (templated)
:type intervals: list
:param hive_tblproperties: additional properties for tblproperties in
hive for the staging table
:type hive_tblproperties: dict
:param job_properties: additional properties for job
:type job_properties: dict
.. attribute:: template_fields
:annotation: = ['sql', 'intervals']
.. attribute:: template_ext
:annotation: = ['.sql']
.. method:: execute(self, context)
.. method:: construct_ingest_query(self, static_path, columns)
Builds an ingest query for an HDFS TSV load.
:param static_path: The path on hdfs where the data is
:type static_path: str
:param columns: List of all the columns that are available
:type columns: list