tajo-docs/src/main/sphinx/hive_integration.rst - tajo - Git at Google

 ****************
 Hive Integration
 ****************

 Apache Tajo™ catalog supports HiveCatalogStore to integrate with Apache Hive™.
 This integration allows Tajo to access all tables used in Apache Hive.
 Depending on your purpose, you can execute either SQL queries or HiveQL queries on the
 same tables managed in Apache Hive.

 In order to use this feature, you need to build Tajo with a specified maven profile
 and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``.
 This section describes how to setup HiveMetaStore integration.
 This instruction would take no more than five minutes.

 You need to set your Hive home directory to the environment variable **HIVE_HOME** in ``conf/tajo-env.sh`` as follows:

 .. code-block:: sh

   export HIVE_HOME=/path/to/your/hive/directory

 If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver.
 Next, you should set the path of MySQL JDBC driver jar file to the environment variable **HIVE_JDBC_DRIVER_DIR** in ``conf/tajo-env.sh`` as follows:

 .. code-block:: sh

   export HIVE_JDBC_DRIVER_DIR=/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar

 Finally, you should specify HiveCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows:

 .. code-block:: xml

   <property>
     <name>tajo.catalog.store.class</name>
     <value>org.apache.tajo.catalog.store.HiveCatalogStore</value>
   </property>

 .. note::

   Hive stores a list of partitions for each table in its metastore. When new partitions are
   added directly to HDFS, HiveMetastore can't recognize these partitions until the user executes
   ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or
   ``MSCK REPAIR TABLE table_name`` command.

   But current Tajo doesn't provide ``ADD PARTITION`` command and Hive doesn't provide an api for
   responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to Hive partitioned
   table and you want to scan the updated partitions through Tajo, you must run following command on Hive
   (see `Hive doc <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)>`_
   for more details of the command):

   .. code-block:: sql

     MSCK REPAIR TABLE [table_name];
	****************
	Hive Integration
	****************

	Apache Tajo™ catalog supports HiveCatalogStore to integrate with Apache Hive™.
	This integration allows Tajo to access all tables used in Apache Hive.
	Depending on your purpose, you can execute either SQL queries or HiveQL queries on the
	same tables managed in Apache Hive.

	In order to use this feature, you need to build Tajo with a specified maven profile
	and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``.
	This section describes how to setup HiveMetaStore integration.
	This instruction would take no more than five minutes.

	You need to set your Hive home directory to the environment variable HIVE_HOME in ``conf/tajo-env.sh`` as follows:

	.. code-block:: sh

	export HIVE_HOME=/path/to/your/hive/directory

	If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver.
	Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in ``conf/tajo-env.sh`` as follows:

	.. code-block:: sh

	export HIVE_JDBC_DRIVER_DIR=/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar

	Finally, you should specify HiveCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows:

	.. code-block:: xml

	<property>
	<name>tajo.catalog.store.class</name>
	<value>org.apache.tajo.catalog.store.HiveCatalogStore</value>
	</property>

	.. note::

	Hive stores a list of partitions for each table in its metastore. When new partitions are
	added directly to HDFS, HiveMetastore can't recognize these partitions until the user executes
	``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or
	``MSCK REPAIR TABLE table_name`` command.

	But current Tajo doesn't provide ``ADD PARTITION`` command and Hive doesn't provide an api for
	responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to Hive partitioned
	table and you want to scan the updated partitions through Tajo, you must run following command on Hive
	(see `Hive doc <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)>`_
	for more details of the command):

	.. code-block:: sql

	MSCK REPAIR TABLE [table_name];