docs-archive/1.10.13/_sources/_api/airflow/operators/hive_to_druid/index.rst.txt - airflow-site - Git at Google

 :mod:`airflow.operators.hive_to_druid`
 ======================================

 .. py:module:: airflow.operators.hive_to_druid


 Module Contents
 ---------------

 .. data:: LOAD_CHECK_INTERVAL
    :annotation: = 5


 .. data:: DEFAULT_TARGET_PARTITION_SIZE
    :annotation: = 5000000


 .. py:class:: HiveToDruidTransfer(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=-1, target_partition_size=-1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, *args, **kwargs)

    Bases: :class:`airflow.models.BaseOperator`

    Moves data from Hive to Druid, [del]note that for now the data is loaded
    into memory before being pushed to Druid, so this operator should
    be used for smallish amount of data.[/del]

    :param sql: SQL query to execute against the Druid database. (templated)
    :type sql: str
    :param druid_datasource: the datasource you want to ingest into in druid
    :type druid_datasource: str
    :param ts_dim: the timestamp dimension
    :type ts_dim: str
    :param metric_spec: the metrics you want to define for your data
    :type metric_spec: list
    :param hive_cli_conn_id: the hive connection id
    :type hive_cli_conn_id: str
    :param druid_ingest_conn_id: the druid ingest connection id
    :type druid_ingest_conn_id: str
    :param metastore_conn_id: the metastore connection id
    :type metastore_conn_id: str
    :param hadoop_dependency_coordinates: list of coordinates to squeeze
        int the ingest json
    :type hadoop_dependency_coordinates: list[str]
    :param intervals: list of time intervals that defines segments,
        this is passed as is to the json object. (templated)
    :type intervals: list
    :param hive_tblproperties: additional properties for tblproperties in
        hive for the staging table
    :type hive_tblproperties: dict
    :param job_properties: additional properties for job
    :type job_properties: dict

    .. attribute:: template_fields
       :annotation: = ['sql', 'intervals']


    .. attribute:: template_ext
       :annotation: = ['.sql']


    .. method:: execute(self, context)


    .. method:: construct_ingest_query(self, static_path, columns)

       Builds an ingest query for an HDFS TSV load.

       :param static_path: The path on hdfs where the data is
       :type static_path: str
       :param columns: List of all the columns that are available
       :type columns: list
	:mod:`airflow.operators.hive_to_druid`
	======================================

	.. py:module:: airflow.operators.hive_to_druid


	Module Contents
	---------------

	.. data:: LOAD_CHECK_INTERVAL
	:annotation: = 5



	.. data:: DEFAULT_TARGET_PARTITION_SIZE
	:annotation: = 5000000



	.. py:class:: HiveToDruidTransfer(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=-1, target_partition_size=-1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, args, *kwargs)

	Bases: :class:`airflow.models.BaseOperator`

	Moves data from Hive to Druid, [del]note that for now the data is loaded
	into memory before being pushed to Druid, so this operator should
	be used for smallish amount of data.[/del]

	:param sql: SQL query to execute against the Druid database. (templated)
	:type sql: str
	:param druid_datasource: the datasource you want to ingest into in druid
	:type druid_datasource: str
	:param ts_dim: the timestamp dimension
	:type ts_dim: str
	:param metric_spec: the metrics you want to define for your data
	:type metric_spec: list
	:param hive_cli_conn_id: the hive connection id
	:type hive_cli_conn_id: str
	:param druid_ingest_conn_id: the druid ingest connection id
	:type druid_ingest_conn_id: str
	:param metastore_conn_id: the metastore connection id
	:type metastore_conn_id: str
	:param hadoop_dependency_coordinates: list of coordinates to squeeze
	int the ingest json
	:type hadoop_dependency_coordinates: list[str]
	:param intervals: list of time intervals that defines segments,
	this is passed as is to the json object. (templated)
	:type intervals: list
	:param hive_tblproperties: additional properties for tblproperties in
	hive for the staging table
	:type hive_tblproperties: dict
	:param job_properties: additional properties for job
	:type job_properties: dict

	.. attribute:: template_fields
	:annotation: = ['sql', 'intervals']



	.. attribute:: template_ext
	:annotation: = ['.sql']




	.. method:: execute(self, context)




	.. method:: construct_ingest_query(self, static_path, columns)

	Builds an ingest query for an HDFS TSV load.

	:param static_path: The path on hdfs where the data is
	:type static_path: str
	:param columns: List of all the columns that are available
	:type columns: list