tajo-docs/src/main/sphinx/functions/hivefunc.rst - tajo - Git at Google

 ##############
 Hive Functions
 ##############

 Tajo provides a feature to use Hive functions directly without re-compilation or additional code.

 *************
 Configuration
 *************

 Only thing to do is registering path to a directory for jar files containing your hive functions.
 You can do this by set ``tajo.function.hive.jar-dir`` in ``tajo-site.xml`` like the following.

 .. code-block:: xml

   <property>
     <name>tajo.function.hive.jar-dir</name>
     <value>/path/to/hive/function/jar</value>
   </property>

 .. note::
   The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.

 .. warning::

   The path must point to a directory, not a file. And multiple directory entries are not allowed.
   However, it is possible to load multiple jar files.

 ***************
 Using in detail
 ***************

 =============
 Function Name
 =============

 Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
 ``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
 it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.

 Additionally if some function signature duplicate occurs, it may throw ``AmbiguousFunctionException``.

 ============================
 Parameter type / Return type
 ============================

 Hive uses *Writable* type of Hadoop in functions, but Tajo uses its internal *Datum* type.
 So only some Writable types are supported currently by internal converting.
 They are listed below.

 ==================== =========
 Writable             Tajo Type
 ==================== =========
 ByteWritable         INT1
 ShortWritable        INT2
 IntWritable          INT4
 LongWritable         INT8
 FloatWritable        FLOAT4
 DoubleWritable       FLOAT8
 Text                 TEXT
 BytesWritable        VARBINARY
 DateWritable(*)      DATE
 TimestampWritable(*) TIMESTAMP
 HiveCharWritable(*)  CHAR
 ==================== =========

 .. note::

   (*) They are in org.apache.hadoop.hive.serde2.io package, others are in org.apache.hadoop.io package.

 ==========
 Limitation
 ==========

 1. Currently, Hive UDAF is not supported. Old UDAF interface is deprecated in Hive,
 and new GenericUDAF interface cannot be applied because of function design difference between Tajo and Hive.
 For same reason, new GenericUDF functions are not supported in Tajo.

 2. Because HDFS directory is not supported, Hive UDF jar files should be copied to each worker directory and each path
 should be specified in tajo-site.xml.
	##############
	Hive Functions
	##############

	Tajo provides a feature to use Hive functions directly without re-compilation or additional code.

	*************
	Configuration
	*************

	Only thing to do is registering path to a directory for jar files containing your hive functions.
	You can do this by set ``tajo.function.hive.jar-dir`` in ``tajo-site.xml`` like the following.

	.. code-block:: xml

	<property>
	<name>tajo.function.hive.jar-dir</name>
	<value>/path/to/hive/function/jar</value>
	</property>

	.. note::
	The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.

	.. warning::

	The path must point to a directory, not a file. And multiple directory entries are not allowed.
	However, it is possible to load multiple jar files.

	***************
	Using in detail
	***************

	=============
	Function Name
	=============

	Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
	``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
	it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.

	Additionally if some function signature duplicate occurs, it may throw ``AmbiguousFunctionException``.

	============================
	Parameter type / Return type
	============================

	Hive uses Writable type of Hadoop in functions, but Tajo uses its internal Datum type.
	So only some Writable types are supported currently by internal converting.
	They are listed below.

	==================== =========
	Writable Tajo Type
	==================== =========
	ByteWritable INT1
	ShortWritable INT2
	IntWritable INT4
	LongWritable INT8
	FloatWritable FLOAT4
	DoubleWritable FLOAT8
	Text TEXT
	BytesWritable VARBINARY
	DateWritable(*) DATE
	TimestampWritable(*) TIMESTAMP
	HiveCharWritable(*) CHAR
	==================== =========

	.. note::

	(*) They are in org.apache.hadoop.hive.serde2.io package, others are in org.apache.hadoop.io package.

	==========
	Limitation
	==========

	1. Currently, Hive UDAF is not supported. Old UDAF interface is deprecated in Hive,
	and new GenericUDAF interface cannot be applied because of function design difference between Tajo and Hive.
	For same reason, new GenericUDF functions are not supported in Tajo.

	2. Because HDFS directory is not supported, Hive UDF jar files should be copied to each worker directory and each path
	should be specified in tajo-site.xml.