| :mod:`airflow.operators.hive_to_druid` |
| ====================================== |
| |
| .. py:module:: airflow.operators.hive_to_druid |
| |
| |
| Module Contents |
| --------------- |
| |
| .. data:: LOAD_CHECK_INTERVAL |
| :annotation: = 5 |
| |
| |
| |
| .. data:: DEFAULT_TARGET_PARTITION_SIZE |
| :annotation: = 5000000 |
| |
| |
| |
| .. py:class:: HiveToDruidTransfer(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=-1, target_partition_size=-1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, *args, **kwargs) |
| |
| Bases: :class:`airflow.models.BaseOperator` |
| |
| Moves data from Hive to Druid, [del]note that for now the data is loaded |
| into memory before being pushed to Druid, so this operator should |
| be used for smallish amount of data.[/del] |
| |
| :param sql: SQL query to execute against the Druid database. (templated) |
| :type sql: str |
| :param druid_datasource: the datasource you want to ingest into in druid |
| :type druid_datasource: str |
| :param ts_dim: the timestamp dimension |
| :type ts_dim: str |
| :param metric_spec: the metrics you want to define for your data |
| :type metric_spec: list |
| :param hive_cli_conn_id: the hive connection id |
| :type hive_cli_conn_id: str |
| :param druid_ingest_conn_id: the druid ingest connection id |
| :type druid_ingest_conn_id: str |
| :param metastore_conn_id: the metastore connection id |
| :type metastore_conn_id: str |
| :param hadoop_dependency_coordinates: list of coordinates to squeeze |
| int the ingest json |
| :type hadoop_dependency_coordinates: list[str] |
| :param intervals: list of time intervals that defines segments, |
| this is passed as is to the json object. (templated) |
| :type intervals: list |
| :param hive_tblproperties: additional properties for tblproperties in |
| hive for the staging table |
| :type hive_tblproperties: dict |
| :param job_properties: additional properties for job |
| :type job_properties: dict |
| |
| .. attribute:: template_fields |
| :annotation: = ['sql', 'intervals'] |
| |
| |
| |
| .. attribute:: template_ext |
| :annotation: = ['.sql'] |
| |
| |
| |
| |
| .. method:: execute(self, context) |
| |
| |
| |
| |
| .. method:: construct_ingest_query(self, static_path, columns) |
| |
| Builds an ingest query for an HDFS TSV load. |
| |
| :param static_path: The path on hdfs where the data is |
| :type static_path: str |
| :param columns: List of all the columns that are available |
| :type columns: list |
| |
| |
| |
| |