tajo-docs/src/main/sphinx/table_management/orc.rst - tajo - Git at Google

 ***
 ORC
 ***

 **ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading,
 writing, and processing data.
 For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.

 ===========================
 How to Create an ORC Table?
 ===========================

 If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.

 In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
 statement. Below is an example statement for creating a table using orc files.

 .. code-block:: sql

   CREATE TABLE table1 (
     id int,
     name text,
     score float,
     type text
   ) USING orc;

 ===================
 Physical Properties
 ===================

 Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
 The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.

 Now, ORC file provides the following physical properties.

 * ``orc.max.merge.distance``: When ORC file is read, if stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
 * ``orc.stripe.size``: It decides size of each stripe. Default is 64MB.
 * ``orc.compression.kind``: It means the compression algorithm used to compress and write data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``.
 * ``orc.buffer.size``: It decides size of writing buffer. Default is 256KB.
 * ``orc.rowindex.stride``: Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000.

 ======================================
 Compatibility Issues with Apache Hive™
 ======================================

 At the moment, Tajo only supports flat relational tables.
 We are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).
	***
	ORC
	***

	ORC(Optimized Row Columnar) is a columnar storage format from Hive. ORC improves performance for reading,
	writing, and processing data.
	For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.

	===========================
	How to Create an ORC Table?
	===========================

	If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.

	In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
	statement. Below is an example statement for creating a table using orc files.

	.. code-block:: sql

	CREATE TABLE table1 (
	id int,
	name text,
	score float,
	type text
	) USING orc;

	===================
	Physical Properties
	===================

	Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
	The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.

	Now, ORC file provides the following physical properties.

	* ``orc.max.merge.distance``: When ORC file is read, if stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
	* ``orc.stripe.size``: It decides size of each stripe. Default is 64MB.
	* ``orc.compression.kind``: It means the compression algorithm used to compress and write data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``.
	* ``orc.buffer.size``: It decides size of writing buffer. Default is 256KB.
	* ``orc.rowindex.stride``: Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000.

	======================================
	Compatibility Issues with Apache Hive™
	======================================

	At the moment, Tajo only supports flat relational tables.
	We are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).