docs-archive/apache-airflow/1.10.14/_sources/_api/airflow/hooks/hive_hooks/index.rst.txt - airflow-site - Git at Google

 :mod:`airflow.hooks.hive_hooks`
 ===============================

 .. py:module:: airflow.hooks.hive_hooks


 Module Contents
 ---------------

 .. data:: HIVE_QUEUE_PRIORITIES
    :annotation: = ['VERY_HIGH', 'HIGH', 'NORMAL', 'LOW', 'VERY_LOW']


 .. function:: get_context_from_env_var()
    Extract context from env variable, e.g. dag_id, task_id and execution_date,
    so that they can be used inside BashOperator and PythonOperator.

    :return: The context of interest.


 .. py:class:: HiveCliHook(hive_cli_conn_id='hive_cli_default', run_as=None, mapred_queue=None, mapred_queue_priority=None, mapred_job_name=None)

    Bases: :class:`airflow.hooks.base_hook.BaseHook`

    Simple wrapper around the hive CLI.

    It also supports the ``beeline``
    a lighter CLI that runs JDBC and is replacing the heavier
    traditional CLI. To enable ``beeline``, set the use_beeline param in the
    extra field of your connection as in ``{ "use_beeline": true }``

    Note that you can also set default hive CLI parameters using the
    ``hive_cli_params`` to be used in your connection as in
    ``{"hive_cli_params": "-hiveconf mapred.job.tracker=some.jobtracker:444"}``
    Parameters passed here can be overridden by run_cli's hive_conf param

    The extra connection parameter ``auth`` gets passed as in the ``jdbc``
    connection string as is.

    :param mapred_queue: queue used by the Hadoop Scheduler (Capacity or Fair)
    :type  mapred_queue: str
    :param mapred_queue_priority: priority within the job queue.
        Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
    :type  mapred_queue_priority: str
    :param mapred_job_name: This name will appear in the jobtracker.
        This can make monitoring easier.
    :type  mapred_job_name: str


    .. method:: _get_proxy_user(self)

       This function set the proper proxy_user value in case the user overwtire the default.


    .. method:: _prepare_cli_cmd(self)

       This function creates the command list from available information


    .. staticmethod:: _prepare_hiveconf(d)

       This function prepares a list of hiveconf params
       from a dictionary of key value pairs.

       :param d:
       :type d: dict

       >>> hh = HiveCliHook()
       >>> hive_conf = {"hive.exec.dynamic.partition": "true",
       ... "hive.exec.dynamic.partition.mode": "nonstrict"}
       >>> hh._prepare_hiveconf(hive_conf)
       ["-hiveconf", "hive.exec.dynamic.partition=true", "-hiveconf", "hive.exec.dynamic.partition.mode=nonstrict"]


    .. method:: run_cli(self, hql, schema=None, verbose=True, hive_conf=None)

       Run an hql statement using the hive cli. If hive_conf is specified
       it should be a dict and the entries will be set as key/value pairs
       in HiveConf


       :param hive_conf: if specified these key value pairs will be passed
           to hive as ``-hiveconf "key"="value"``. Note that they will be
           passed after the ``hive_cli_params`` and thus will override
           whatever values are specified in the database.
       :type hive_conf: dict

       >>> hh = HiveCliHook()
       >>> result = hh.run_cli("USE airflow;")
       >>> ("OK" in result)
       True


    .. method:: test_hql(self, hql)

       Test an hql statement using the hive cli and EXPLAIN


    .. method:: load_df(self, df, table, field_dict=None, delimiter=',', encoding='utf8', pandas_kwargs=None, **kwargs)

       Loads a pandas DataFrame into hive.

       Hive data types will be inferred if not passed but column names will
       not be sanitized.

       :param df: DataFrame to load into a Hive table
       :type df: pandas.DataFrame
       :param table: target Hive table, use dot notation to target a
           specific database
       :type table: str
       :param field_dict: mapping from column name to hive data type.
           Note that it must be OrderedDict so as to keep columns' order.
       :type field_dict: collections.OrderedDict
       :param delimiter: field delimiter in the file
       :type delimiter: str
       :param encoding: str encoding to use when writing DataFrame to file
       :type encoding: str
       :param pandas_kwargs: passed to DataFrame.to_csv
       :type pandas_kwargs: dict
       :param kwargs: passed to self.load_file


    .. method:: load_file(self, filepath, table, delimiter=',', field_dict=None, create=True, overwrite=True, partition=None, recreate=False, tblproperties=None)

       Loads a local file into Hive

       Note that the table generated in Hive uses ``STORED AS textfile``
       which isn't the most efficient serialization format. If a
       large amount of data is loaded and/or if the tables gets
       queried considerably, you may want to use this operator only to
       stage the data into a temporary table before loading it into its
       final destination using a ``HiveOperator``.

       :param filepath: local filepath of the file to load
       :type filepath: str
       :param table: target Hive table, use dot notation to target a
           specific database
       :type table: str
       :param delimiter: field delimiter in the file
       :type delimiter: str
       :param field_dict: A dictionary of the fields name in the file
           as keys and their Hive types as values.
           Note that it must be OrderedDict so as to keep columns' order.
       :type field_dict: collections.OrderedDict
       :param create: whether to create the table if it doesn't exist
       :type create: bool
       :param overwrite: whether to overwrite the data in table or partition
       :type overwrite: bool
       :param partition: target partition as a dict of partition columns
           and values
       :type partition: dict
       :param recreate: whether to drop and recreate the table at every
           execution
       :type recreate: bool
       :param tblproperties: TBLPROPERTIES of the hive table being created
       :type tblproperties: dict


    .. method:: kill(self)


 .. py:class:: HiveMetastoreHook(metastore_conn_id='metastore_default')

    Bases: :class:`airflow.hooks.base_hook.BaseHook`

    Wrapper to interact with the Hive Metastore

    .. attribute:: MAX_PART_COUNT
       :annotation: = 32767


    .. method:: __getstate__(self)


    .. method:: __setstate__(self, d)


    .. method:: get_metastore_client(self)

       Returns a Hive thrift client.


    .. method:: _find_valid_server(self)


    .. method:: get_conn(self)


    .. method:: check_for_partition(self, schema, table, partition)

       Checks whether a partition exists

       :param schema: Name of hive schema (database) @table belongs to
       :type schema: str
       :param table: Name of hive table @partition belongs to
       :type schema: str
       :partition: Expression that matches the partitions to check for
           (eg `a = 'b' AND c = 'd'`)
       :type schema: str
       :rtype: bool

       >>> hh = HiveMetastoreHook()
       >>> t = 'static_babynames_partitioned'
       >>> hh.check_for_partition('airflow', t, "ds='2015-01-01'")
       True


    .. method:: check_for_named_partition(self, schema, table, partition_name)

       Checks whether a partition with a given name exists

       :param schema: Name of hive schema (database) @table belongs to
       :type schema: str
       :param table: Name of hive table @partition belongs to
       :type schema: str
       :partition: Name of the partitions to check for (eg `a=b/c=d`)
       :type schema: str
       :rtype: bool

       >>> hh = HiveMetastoreHook()
       >>> t = 'static_babynames_partitioned'
       >>> hh.check_for_named_partition('airflow', t, "ds=2015-01-01")
       True
       >>> hh.check_for_named_partition('airflow', t, "ds=xxx")
       False


    .. method:: get_table(self, table_name, db='default')

       Get a metastore table object

       >>> hh = HiveMetastoreHook()
       >>> t = hh.get_table(db='airflow', table_name='static_babynames')
       >>> t.tableName
       'static_babynames'
       >>> [col.name for col in t.sd.cols]
       ['state', 'year', 'name', 'gender', 'num']


    .. method:: get_tables(self, db, pattern='*')

       Get a metastore table object


    .. method:: get_databases(self, pattern='*')

       Get a metastore table object


    .. method:: get_partitions(self, schema, table_name, filter=None)

       Returns a list of all partitions in a table. Works only
       for tables with less than 32767 (java short max val).
       For subpartitioned table, the number might easily exceed this.

       >>> hh = HiveMetastoreHook()
       >>> t = 'static_babynames_partitioned'
       >>> parts = hh.get_partitions(schema='airflow', table_name=t)
       >>> len(parts)
       1
       >>> parts
       [{'ds': '2015-01-01'}]


    .. staticmethod:: _get_max_partition_from_part_specs(part_specs, partition_key, filter_map)

       Helper method to get max partition of partitions with partition_key
       from part specs. key:value pair in filter_map will be used to
       filter out partitions.

       :param part_specs: list of partition specs.
       :type part_specs: list
       :param partition_key: partition key name.
       :type partition_key: str
       :param filter_map: partition_key:partition_value map used for partition filtering,
                          e.g. {'key1': 'value1', 'key2': 'value2'}.
                          Only partitions matching all partition_key:partition_value
                          pairs will be considered as candidates of max partition.
       :type filter_map: map
       :return: Max partition or None if part_specs is empty.
       :rtype: basestring


    .. method:: max_partition(self, schema, table_name, field=None, filter_map=None)

       Returns the maximum value for all partitions with given field in a table.
       If only one partition key exist in the table, the key will be used as field.
       filter_map should be a partition_key:partition_value map and will be used to
       filter out partitions.

       :param schema: schema name.
       :type schema: str
       :param table_name: table name.
       :type table_name: str
       :param field: partition key to get max partition from.
       :type field: str
       :param filter_map: partition_key:partition_value map used for partition filtering.
       :type filter_map: map

       >>> hh = HiveMetastoreHook()
       >>> filter_map = {'ds': '2015-01-01', 'ds': '2014-01-01'}
       >>> t = 'static_babynames_partitioned'
       >>> hh.max_partition(schema='airflow',        ... table_name=t, field='ds', filter_map=filter_map)
       '2015-01-01'


    .. method:: table_exists(self, table_name, db='default')

       Check if table exists

       >>> hh = HiveMetastoreHook()
       >>> hh.table_exists(db='airflow', table_name='static_babynames')
       True
       >>> hh.table_exists(db='airflow', table_name='does_not_exist')
       False


 .. py:class:: HiveServer2Hook(hiveserver2_conn_id='hiveserver2_default')

    Bases: :class:`airflow.hooks.base_hook.BaseHook`

    Wrapper around the pyhive library

    Notes:
    * the default authMechanism is PLAIN, to override it you
    can specify it in the ``extra`` of your connection in the UI
    * the default for run_set_variable_statements is true, if you
    are using impala you may need to set it to false in the
    ``extra`` of your connection in the UI


    .. method:: get_conn(self, schema=None)

       Returns a Hive connection object.


    .. method:: _get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)


    .. method:: get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)

       Get results of the provided hql in target schema.

       :param hql: hql to be executed.
       :type hql: str or list
       :param schema: target schema, default to 'default'.
       :type schema: str
       :param fetch_size: max size of result to fetch.
       :type fetch_size: int
       :param hive_conf: hive_conf to execute alone with the hql.
       :type hive_conf: dict
       :return: results of hql execution, dict with data (list of results) and header
       :rtype: dict


    .. method:: to_csv(self, hql, csv_filepath, schema='default', delimiter=',', lineterminator='\r\n', output_header=True, fetch_size=1000, hive_conf=None)

       Execute hql in target schema and write results to a csv file.

       :param hql: hql to be executed.
       :type hql: str or list
       :param csv_filepath: filepath of csv to write results into.
       :type csv_filepath: str
       :param schema: target schema, default to 'default'.
       :type schema: str
       :param delimiter: delimiter of the csv file, default to ','.
       :type delimiter: str
       :param lineterminator: lineterminator of the csv file.
       :type lineterminator: str
       :param output_header: header of the csv file, default to True.
       :type output_header: bool
       :param fetch_size: number of result rows to write into the csv file, default to 1000.
       :type fetch_size: int
       :param hive_conf: hive_conf to execute alone with the hql.
       :type hive_conf: dict


    .. method:: get_records(self, hql, schema='default')

       Get a set of records from a Hive query.

       :param hql: hql to be executed.
       :type hql: str or list
       :param schema: target schema, default to 'default'.
       :type schema: str
       :param hive_conf: hive_conf to execute alone with the hql.
       :type hive_conf: dict
       :return: result of hive execution
       :rtype: list

       >>> hh = HiveServer2Hook()
       >>> sql = "SELECT * FROM airflow.static_babynames LIMIT 100"
       >>> len(hh.get_records(sql))
       100


    .. method:: get_pandas_df(self, hql, schema='default', **kwargs)

       Get a pandas dataframe from a Hive query

       :param hql: hql to be executed.
       :type hql: str or list
       :param schema: target schema, default to 'default'.
       :type schema: str
       :param kwargs: (optional) passed into pandas.DataFrame constructor
       :type kwargs: dict
       :return: result of hql execution
       :rtype: DataFrame

       >>> hh = HiveServer2Hook()
       >>> sql = "SELECT * FROM airflow.static_babynames LIMIT 100"
       >>> df = hh.get_pandas_df(sql)
       >>> len(df.index)
       100

       :return: pandas.DateFrame
	:mod:`airflow.hooks.hive_hooks`
	===============================

	.. py:module:: airflow.hooks.hive_hooks


	Module Contents
	---------------

	.. data:: HIVE_QUEUE_PRIORITIES
	:annotation: = ['VERY_HIGH', 'HIGH', 'NORMAL', 'LOW', 'VERY_LOW']



	.. function:: get_context_from_env_var()
	Extract context from env variable, e.g. dag_id, task_id and execution_date,
	so that they can be used inside BashOperator and PythonOperator.

	:return: The context of interest.


	.. py:class:: HiveCliHook(hive_cli_conn_id='hive_cli_default', run_as=None, mapred_queue=None, mapred_queue_priority=None, mapred_job_name=None)

	Bases: :class:`airflow.hooks.base_hook.BaseHook`

	Simple wrapper around the hive CLI.

	It also supports the ``beeline``
	a lighter CLI that runs JDBC and is replacing the heavier
	traditional CLI. To enable ``beeline``, set the use_beeline param in the
	extra field of your connection as in ``{ "use_beeline": true }``

	Note that you can also set default hive CLI parameters using the
	``hive_cli_params`` to be used in your connection as in
	``{"hive_cli_params": "-hiveconf mapred.job.tracker=some.jobtracker:444"}``
	Parameters passed here can be overridden by run_cli's hive_conf param

	The extra connection parameter ``auth`` gets passed as in the ``jdbc``
	connection string as is.

	:param mapred_queue: queue used by the Hadoop Scheduler (Capacity or Fair)
	:type mapred_queue: str
	:param mapred_queue_priority: priority within the job queue.
	Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
	:type mapred_queue_priority: str
	:param mapred_job_name: This name will appear in the jobtracker.
	This can make monitoring easier.
	:type mapred_job_name: str


	.. method:: _get_proxy_user(self)

	This function set the proper proxy_user value in case the user overwtire the default.




	.. method:: _prepare_cli_cmd(self)

	This function creates the command list from available information




	.. staticmethod:: _prepare_hiveconf(d)

	This function prepares a list of hiveconf params
	from a dictionary of key value pairs.

	:param d:
	:type d: dict

	>>> hh = HiveCliHook()
	>>> hive_conf = {"hive.exec.dynamic.partition": "true",
	... "hive.exec.dynamic.partition.mode": "nonstrict"}
	>>> hh._prepare_hiveconf(hive_conf)
	["-hiveconf", "hive.exec.dynamic.partition=true", "-hiveconf", "hive.exec.dynamic.partition.mode=nonstrict"]




	.. method:: run_cli(self, hql, schema=None, verbose=True, hive_conf=None)

	Run an hql statement using the hive cli. If hive_conf is specified
	it should be a dict and the entries will be set as key/value pairs
	in HiveConf


	:param hive_conf: if specified these key value pairs will be passed
	to hive as ``-hiveconf "key"="value"``. Note that they will be
	passed after the ``hive_cli_params`` and thus will override
	whatever values are specified in the database.
	:type hive_conf: dict

	>>> hh = HiveCliHook()
	>>> result = hh.run_cli("USE airflow;")
	>>> ("OK" in result)
	True




	.. method:: test_hql(self, hql)

	Test an hql statement using the hive cli and EXPLAIN




	.. method:: load_df(self, df, table, field_dict=None, delimiter=',', encoding='utf8', pandas_kwargs=None, **kwargs)

	Loads a pandas DataFrame into hive.

	Hive data types will be inferred if not passed but column names will
	not be sanitized.

	:param df: DataFrame to load into a Hive table
	:type df: pandas.DataFrame
	:param table: target Hive table, use dot notation to target a
	specific database
	:type table: str
	:param field_dict: mapping from column name to hive data type.
	Note that it must be OrderedDict so as to keep columns' order.
	:type field_dict: collections.OrderedDict
	:param delimiter: field delimiter in the file
	:type delimiter: str
	:param encoding: str encoding to use when writing DataFrame to file
	:type encoding: str
	:param pandas_kwargs: passed to DataFrame.to_csv
	:type pandas_kwargs: dict
	:param kwargs: passed to self.load_file




	.. method:: load_file(self, filepath, table, delimiter=',', field_dict=None, create=True, overwrite=True, partition=None, recreate=False, tblproperties=None)

	Loads a local file into Hive

	Note that the table generated in Hive uses ``STORED AS textfile``
	which isn't the most efficient serialization format. If a
	large amount of data is loaded and/or if the tables gets
	queried considerably, you may want to use this operator only to
	stage the data into a temporary table before loading it into its
	final destination using a ``HiveOperator``.

	:param filepath: local filepath of the file to load
	:type filepath: str
	:param table: target Hive table, use dot notation to target a
	specific database
	:type table: str
	:param delimiter: field delimiter in the file
	:type delimiter: str
	:param field_dict: A dictionary of the fields name in the file
	as keys and their Hive types as values.
	Note that it must be OrderedDict so as to keep columns' order.
	:type field_dict: collections.OrderedDict
	:param create: whether to create the table if it doesn't exist
	:type create: bool
	:param overwrite: whether to overwrite the data in table or partition
	:type overwrite: bool
	:param partition: target partition as a dict of partition columns
	and values
	:type partition: dict
	:param recreate: whether to drop and recreate the table at every
	execution
	:type recreate: bool
	:param tblproperties: TBLPROPERTIES of the hive table being created
	:type tblproperties: dict




	.. method:: kill(self)




	.. py:class:: HiveMetastoreHook(metastore_conn_id='metastore_default')

	Bases: :class:`airflow.hooks.base_hook.BaseHook`

	Wrapper to interact with the Hive Metastore

	.. attribute:: MAX_PART_COUNT
	:annotation: = 32767




	.. method:: __getstate__(self)




	.. method:: __setstate__(self, d)




	.. method:: get_metastore_client(self)

	Returns a Hive thrift client.




	.. method:: _find_valid_server(self)




	.. method:: get_conn(self)




	.. method:: check_for_partition(self, schema, table, partition)

	Checks whether a partition exists

	:param schema: Name of hive schema (database) @table belongs to
	:type schema: str
	:param table: Name of hive table @partition belongs to
	:type schema: str
	:partition: Expression that matches the partitions to check for
	(eg `a = 'b' AND c = 'd'`)
	:type schema: str
	:rtype: bool

	>>> hh = HiveMetastoreHook()
	>>> t = 'static_babynames_partitioned'
	>>> hh.check_for_partition('airflow', t, "ds='2015-01-01'")
	True




	.. method:: check_for_named_partition(self, schema, table, partition_name)

	Checks whether a partition with a given name exists

	:param schema: Name of hive schema (database) @table belongs to
	:type schema: str
	:param table: Name of hive table @partition belongs to
	:type schema: str
	:partition: Name of the partitions to check for (eg `a=b/c=d`)
	:type schema: str
	:rtype: bool

	>>> hh = HiveMetastoreHook()
	>>> t = 'static_babynames_partitioned'
	>>> hh.check_for_named_partition('airflow', t, "ds=2015-01-01")
	True
	>>> hh.check_for_named_partition('airflow', t, "ds=xxx")
	False




	.. method:: get_table(self, table_name, db='default')

	Get a metastore table object

	>>> hh = HiveMetastoreHook()
	>>> t = hh.get_table(db='airflow', table_name='static_babynames')
	>>> t.tableName
	'static_babynames'
	>>> [col.name for col in t.sd.cols]
	['state', 'year', 'name', 'gender', 'num']




	.. method:: get_tables(self, db, pattern='*')

	Get a metastore table object




	.. method:: get_databases(self, pattern='*')

	Get a metastore table object




	.. method:: get_partitions(self, schema, table_name, filter=None)

	Returns a list of all partitions in a table. Works only
	for tables with less than 32767 (java short max val).
	For subpartitioned table, the number might easily exceed this.

	>>> hh = HiveMetastoreHook()
	>>> t = 'static_babynames_partitioned'
	>>> parts = hh.get_partitions(schema='airflow', table_name=t)
	>>> len(parts)
	1
	>>> parts
	[{'ds': '2015-01-01'}]




	.. staticmethod:: _get_max_partition_from_part_specs(part_specs, partition_key, filter_map)

	Helper method to get max partition of partitions with partition_key
	from part specs. key:value pair in filter_map will be used to
	filter out partitions.

	:param part_specs: list of partition specs.
	:type part_specs: list
	:param partition_key: partition key name.
	:type partition_key: str
	:param filter_map: partition_key:partition_value map used for partition filtering,
	e.g. {'key1': 'value1', 'key2': 'value2'}.
	Only partitions matching all partition_key:partition_value
	pairs will be considered as candidates of max partition.
	:type filter_map: map
	:return: Max partition or None if part_specs is empty.
	:rtype: basestring




	.. method:: max_partition(self, schema, table_name, field=None, filter_map=None)

	Returns the maximum value for all partitions with given field in a table.
	If only one partition key exist in the table, the key will be used as field.
	filter_map should be a partition_key:partition_value map and will be used to
	filter out partitions.

	:param schema: schema name.
	:type schema: str
	:param table_name: table name.
	:type table_name: str
	:param field: partition key to get max partition from.
	:type field: str
	:param filter_map: partition_key:partition_value map used for partition filtering.
	:type filter_map: map

	>>> hh = HiveMetastoreHook()
	>>> filter_map = {'ds': '2015-01-01', 'ds': '2014-01-01'}
	>>> t = 'static_babynames_partitioned'
	>>> hh.max_partition(schema='airflow', ... table_name=t, field='ds', filter_map=filter_map)
	'2015-01-01'




	.. method:: table_exists(self, table_name, db='default')

	Check if table exists

	>>> hh = HiveMetastoreHook()
	>>> hh.table_exists(db='airflow', table_name='static_babynames')
	True
	>>> hh.table_exists(db='airflow', table_name='does_not_exist')
	False




	.. py:class:: HiveServer2Hook(hiveserver2_conn_id='hiveserver2_default')

	Bases: :class:`airflow.hooks.base_hook.BaseHook`

	Wrapper around the pyhive library

	Notes:
	* the default authMechanism is PLAIN, to override it you
	can specify it in the ``extra`` of your connection in the UI
	* the default for run_set_variable_statements is true, if you
	are using impala you may need to set it to false in the
	``extra`` of your connection in the UI


	.. method:: get_conn(self, schema=None)

	Returns a Hive connection object.




	.. method:: _get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)




	.. method:: get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)

	Get results of the provided hql in target schema.

	:param hql: hql to be executed.
	:type hql: str or list
	:param schema: target schema, default to 'default'.
	:type schema: str
	:param fetch_size: max size of result to fetch.
	:type fetch_size: int
	:param hive_conf: hive_conf to execute alone with the hql.
	:type hive_conf: dict
	:return: results of hql execution, dict with data (list of results) and header
	:rtype: dict




	.. method:: to_csv(self, hql, csv_filepath, schema='default', delimiter=',', lineterminator='\r\n', output_header=True, fetch_size=1000, hive_conf=None)

	Execute hql in target schema and write results to a csv file.

	:param hql: hql to be executed.
	:type hql: str or list
	:param csv_filepath: filepath of csv to write results into.
	:type csv_filepath: str
	:param schema: target schema, default to 'default'.
	:type schema: str
	:param delimiter: delimiter of the csv file, default to ','.
	:type delimiter: str
	:param lineterminator: lineterminator of the csv file.
	:type lineterminator: str
	:param output_header: header of the csv file, default to True.
	:type output_header: bool
	:param fetch_size: number of result rows to write into the csv file, default to 1000.
	:type fetch_size: int
	:param hive_conf: hive_conf to execute alone with the hql.
	:type hive_conf: dict




	.. method:: get_records(self, hql, schema='default')

	Get a set of records from a Hive query.

	:param hql: hql to be executed.
	:type hql: str or list
	:param schema: target schema, default to 'default'.
	:type schema: str
	:param hive_conf: hive_conf to execute alone with the hql.
	:type hive_conf: dict
	:return: result of hive execution
	:rtype: list

	>>> hh = HiveServer2Hook()
	>>> sql = "SELECT * FROM airflow.static_babynames LIMIT 100"
	>>> len(hh.get_records(sql))
	100




	.. method:: get_pandas_df(self, hql, schema='default', **kwargs)

	Get a pandas dataframe from a Hive query

	:param hql: hql to be executed.
	:type hql: str or list
	:param schema: target schema, default to 'default'.
	:type schema: str
	:param kwargs: (optional) passed into pandas.DataFrame constructor
	:type kwargs: dict
	:return: result of hql execution
	:rtype: DataFrame

	>>> hh = HiveServer2Hook()
	>>> sql = "SELECT * FROM airflow.static_babynames LIMIT 100"
	>>> df = hh.get_pandas_df(sql)
	>>> len(df.index)
	100

	:return: pandas.DateFrame