<div class="unit four-fifths">
<p>ORC files are completely self-describing and do not depend on the Hive
Metastore or any other external metadata. The file includes all of the
type and encoding information for the objects stored in the file. Because the
file is self-contained, it does not depend on the user’s environment to
correctly interpret the file’s contents.</p>
<p>ORC provides a rich set of scalar and compound types:</p>
<li>boolean (1 bit)</li>
<li>tinyint (8 bit)</li>
<li>smallint (16 bit)</li>
<li>int (32 bit)</li>
<li>bigint (64 bit)</li>
<li>Floating point
<li>String types
<li>Binary blobs
<li>Decimal type
<li>timestamp with local time zone</li>
<li>Compound types
<p>All ORC file are logically sequences of identically typed objects. Hive
always uses a struct with a field for each of the top-level columns as
the root object type, but that is not required. All types in ORC can take
null values including the compound types.</p>
<p>Compound types have children columns that hold the values for their
sub-elements. For example, a struct column has one child column for
each field of the struct. Lists always have a single child column for
the element values and maps always have two child columns. Union
columns have one child column for each of the variants.</p>
<p>Given the following definition of the table Foobar, the columns in the
file would form the given tree.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>create table Foobar (
myInt int,
myMap map&lt;string,
struct&lt;myString : string,
myDouble: double&gt;&gt;,
myTime timestamp
<p><img src="/img/TreeWriters.png" alt="ORC column structure" /></p>
<h1 id="timestamps">Timestamps</h1>
<p>ORC includes two different forms of timestamps from the SQL world:</p>
<li><strong>Timestamp</strong> is a date and time without a time zone, which does not change based on the time zone of the reader.</li>
<li><strong>Timestamp with local time zone</strong> is a fixed instant in time, which does change based on the time zone of the reader.</li>
<p>Unless your application uses UTC consistently, <strong>timestamp with
local time zone</strong> is strongly preferred over <strong>timestamp</strong> for most
use cases. When users say an event is at 10:00, it is always in
reference to a certain timezone and means a point in time, rather than
10:00 in an arbitrary time zone.</p>
<th>Value in America/Los_Angeles</th>
<th>Value in America/New_York</th>
<td>2014-12-12 6:00:00</td>
<td>2014-12-12 6:00:00</td>
<td><strong>timestamp with local time zone</strong></td>
<td>2014-12-12 9:00:00</td>
<td>2014-12-12 6:00:00</td>
