blob: 88a1efb3c62c0f0b23da8db431861bcb8e8d1813 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Apache Kudu Schema Design</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li class="active">
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Releases</a>
</li>
<li >
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="container">
<div class="row">
<div class="col-md-9">
<h1>Apache Kudu Schema Design</h1>
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Kudu tables have a structured data model similar to tables in a traditional
RDBMS. Schema design is critical for achieving the best performance and operational
stability from Kudu. Every workload is unique, and there is no single schema design
that is best for every table. This document outlines effective schema design
philosophies for Kudu, paying particular attention to where they differ from
approaches used for traditional RDBMS schemas.</p>
</div>
<div class="paragraph">
<p>At a high level, there are three concerns in Kudu schema design:
<a href="#column-design">column design</a>, <a href="#primary-keys">primary keys</a>, and
<a href="#data-distribution">data distribution</a>. Of these, only data distribution will
be a new concept for those familiar with traditional relational databases. The
next sections discuss <a href="#alter-schema">altering the schema</a> of an existing table,
and <a href="#known-limitations">known limitations</a> with regard to schema design.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_the_perfect_schema"><a class="link" href="#_the_perfect_schema">The Perfect Schema</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>The perfect schema would accomplish the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Data would be distributed in such a way that reads and writes are spread evenly
across tablet servers. This is impacted by the partition schema.</p>
</li>
<li>
<p>Tablets would grow at an even, predictable rate and load across tablets would remain
steady over time. This is most impacted by the partition schema.</p>
</li>
<li>
<p>Scans would read the minimum amount of data necessary to fulfill a query. This
is impacted mostly by primary key design, but partition design also plays a
role via partition pruning.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The perfect schema depends on the characteristics of your data, what you need to do
with it, and the topology of your cluster. Schema design is the single most important
thing within your control to maximize the performance of your Kudu cluster.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="primary-keys"><a class="link" href="#primary-keys">Primary Keys</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Each Kudu table must declare a primary key comprised of one or more columns.
Primary key columns must be non-nullable, and may not be a boolean or
floating-point type. Every row in a table must have a unique set of values for
its primary key columns. As with a traditional RDBMS, primary key
selection is critical to ensuring performant database operations.</p>
</div>
<div class="paragraph">
<p>Unlike an RDBMS, Kudu does not provide an auto-incrementing column feature, so
the application must always provide the full primary key during insert or
ingestion. In addition, Kudu does not allow the primary key values of a row to
be updated.</p>
</div>
<div class="paragraph">
<p>Within a tablet, rows are stored sorted lexicographically by primary key. Advanced
schema designs can take advantage of this ordering to achieve good distribution of
data among tablets, while retaining consistent ordering in intra-tablet scans. See
<a href="#data-distribution">Data Distribution</a> for more information.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="data-distribution"><a class="link" href="#data-distribution">Data Distribution</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu tables, unlike traditional relational tables, are partitioned into tablets
and distributed across many tablet servers. A row always belongs to a single
tablet (and its replicas). The method of assigning rows to tablets is specified
in a configurable <em>partition schema</em> for each table, during table creation.</p>
</div>
<div class="paragraph">
<p>Choosing a data distribution strategy requires you to understand the data model and
expected workload of a table. For write-heavy workloads, it is important to
design the distribution such that writes are spread across tablets in order to
avoid overloading a single tablet. For workloads involving many short scans, performance
can be improved if all of the data for the scan is located in the same
tablet. Understanding these fundamental trade-offs is central to designing an effective
partition schema.</p>
</div>
<div id="no_default_partitioning" class="admonitionblock important">
<table>
<tr>
<td class="icon">
<i class="fa icon-important" title="Important"></i>
</td>
<td class="content">
<div class="title">No Default Partitioning</div>
===
Kudu does not provide a default partitioning strategy when creating tables. It
is strongly recommended to ensure that new tables have at least as many tablets
as tablet servers (but Kudu can support many tablets per tablet server).
===
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>Kudu provides two types of partition schema: <a href="#range-partitioning">range partitioning</a> and
<a href="#hash-bucketing">hash bucketing</a>. These schema types can be <a href="#hash-and-range">used
together</a> or independently. Kudu does not yet allow tablets to be split after
creation, so you must design your partition schema ahead of time to ensure that
a sufficient number of tablets are created.</p>
</div>
<div class="sect2">
<h3 id="range-partitioning"><a class="link" href="#range-partitioning">Range Partitioning</a></h3>
<div class="paragraph">
<p>With range partitioning, rows are distributed into tablets using a totally-ordered
distribution key. Each tablet is assigned a contiguous segment of the table&#8217;s
distribution keyspace. Tables may be range partitioned on any subset of the
primary key columns.</p>
</div>
<div class="paragraph">
<p>During table creation, tablet boundaries are specified as a sequence of <em>split
rows</em>. Consider the following table schema (using SQL syntax for clarity):</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE customers (last_name STRING NOT NULL,
first_name STRING NOT NULL,
order_count INT32)
PRIMARY KEY (last_name, first_name)
DISTRIBUTE BY RANGE (last_name, first_name);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Specifying the split rows as <code>(("b", ""), ("c", ""), ("d", ""), .., ("z", ""))</code>
(25 split rows total) will result in the creation of 26 tablets, with each
tablet containing a range of customer surnames all beginning with a given letter.
This is an effective partition schema for a workload where customers are inserted
and updated uniformly by last name, and scans are typically performed over a range
of surnames.</p>
</div>
<div class="paragraph">
<p>It may make sense to partition a table by range using only a subset of the
primary key columns, or with a different ordering than the primary key. For
instance, you can change the above example to specify that the range partition
should only include the <code>last_name</code> column. In that case, Kudu would guarantee that all
customers with the same last name would fall into the same tablet, regardless of
the provided split rows.</p>
</div>
<div class="sect3">
<h4 id="range-partition-management"><a class="link" href="#range-partition-management">Range Partition Management</a></h4>
<div class="paragraph">
<p>Kudu 0.10 introduces the ability to specify bounded range partitions during
table creation, and the ability add and drop range partitions on the fly. This is
a good strategy for data which is always increasing, such as timestamps, or for
categorical data, such as geographic regions.</p>
</div>
<div class="paragraph">
<p>For example, during table creation, bounded range partitions can be
added for the regions 'US-EAST', 'US-WEST', and 'EUROPE'. If you attempt to insert a
row with a region that does not match an existing range partition, the insertion will
fail. Later, when a new region is needed it can be efficiently added as part of an
<code>ALTER TABLE</code> operation. This feature is particularly useful for timeseries data,
since it allows new range partitions for the current period to be added as
needed, and old partitions covering historical periods to be dropped if
necessary.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="hash-bucketing"><a class="link" href="#hash-bucketing">Hash Bucketing</a></h3>
<div class="paragraph">
<p>Hash bucketing distributes rows by hash value into one of many buckets. Each
tablet is responsible for the rows falling into a single bucket. The number of
buckets (and therefore tablets), is specified during table creation. Typically,
all of the primary key columns are used as the columns to hash, but as with range
partitioning, any subset of the primary key columns can be used.</p>
</div>
<div class="paragraph">
<p>Hash partitioning is an effective strategy to increase the amount of parallelism
for workloads that would otherwise skew writes into a small number of tablets.
Consider the following table schema.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING,
time UNIXTIME_MICROS NOT NULL,
measurement DOUBLE,
PRIMARY KEY (time, metric, host),
)</code></pre>
</div>
</div>
<div class="paragraph">
<p>If you use range partitioning over the primary key columns, inserts will
tend to only go to the tablet covering the current time, which limits the
maximum write throughput to the throughput of a single tablet. If you use hash
partitioning, you can guarantee a number of parallel writes equal to the number
of buckets specified when defining the partition schema. The trade-off is that a
scan over a single time range now must touch each of these tablets, instead of
(possibly) a single tablet. Hash bucketing can be an effective tool for mitigating
other types of write skew as well, such as monotonically increasing values.</p>
</div>
<div class="paragraph">
<p>As an advanced optimization, you can create a table with more than one
hash bucket component, as long as the column sets included in each are disjoint,
and all hashed columns are part of the primary key. The total number of tablets
created will be the product of the hash bucket counts. For example, the above
<code>metrics</code> table could be created with two hash bucket components, one over the
<code>time</code> column with 4 buckets, and one over the <code>metric</code> and <code>host</code> columns with
8 buckets. The total number of tablets will be 32. The advantage of using two
separate hash bucket components is that scans which specify equality constraints
on the <code>metric</code> and <code>host</code> columns will be able to skip 7/8 of the total
tablets, leaving a total of just 4 tablets to scan.</p>
</div>
</div>
<div class="sect2">
<h3 id="hash-and-range"><a class="link" href="#hash-and-range">Hash Bucketing and Range Partitioning</a></h3>
<div class="paragraph">
<p>Hash bucketing can be combined with range partitioning. Adding hash bucketing to
a range partitioned table has the effect of parallelizing operations that would
otherwise operate sequentially over the range. The total number of tablets is
the product of the number of hash buckets and the number of split rows plus one.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="alter-schema"><a class="link" href="#alter-schema">Schema Alterations</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>You can alter a table&#8217;s schema in the following ways:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Rename the table</p>
</li>
<li>
<p>Rename, add, or drop columns</p>
</li>
<li>
<p>Rename (but not drop) primary key columns</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>You cannot modify the partition schema after table creation.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="column-design"><a class="link" href="#column-design">Column Design</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>A Kudu Table consists of one or more columns, each with a predefined type.
Columns that are not part of the primary key may optionally be nullable.
Supported column types include:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>boolean</p>
</li>
<li>
<p>8-bit signed integer</p>
</li>
<li>
<p>16-bit signed integer</p>
</li>
<li>
<p>32-bit signed integer</p>
</li>
<li>
<p>64-bit signed integer</p>
</li>
<li>
<p>unixtime_micros (64-bit microseconds since the Unix epoch)</p>
</li>
<li>
<p>single-precision (32-bit) IEEE-754 floating-point number</p>
</li>
<li>
<p>double-precision (64-bit) IEEE-754 floating-point number</p>
</li>
<li>
<p>UTF-8 encoded string</p>
</li>
<li>
<p>binary</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Kudu takes advantage of strongly-typed columns and a columnar on-disk storage
format to provide efficient encoding and serialization. To make the most of these
features, columns must be specified as the appropriate type, rather than
simulating a 'schemaless' table using string or binary columns for data which
may otherwise be structured. In addition to encoding, Kudu optionally allows
compression to be specified on a per-column basis.</p>
</div>
<div class="sect2">
<h3 id="encoding"><a class="link" href="#encoding">Column Encoding</a></h3>
<div class="paragraph">
<p>Each column in a Kudu table can be created with an encoding, based on the type
of the column. Columns use plain encoding by default.</p>
</div>
<table class="tableblock frame-all grid-all spread">
<caption class="title">Table 1. Encoding Types</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Column Type</th>
<th class="tableblock halign-left valign-top">Encoding</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">int8, int16, int32</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">plain, bitshuffle, run length</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">int64, unixtime_micros</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">plain, bitshuffle</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">float, double</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">plain, bitshuffle</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">plain, run length</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">string, binary</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">plain, prefix, dictionary</p></td>
</tr>
</tbody>
</table>
<div id="plain" class="dlist">
<dl>
<dt class="hdlist1">Plain Encoding</dt>
<dd>
<p>Data is stored in its natural format. For example, <code>int32</code> values
are stored as fixed-size 32-bit little-endian integers.</p>
</dd>
</dl>
</div>
<div id="bitshuffle" class="dlist">
<dl>
<dt class="hdlist1">Bitshuffle Encoding</dt>
<dd>
<p>Data is rearranged to store the most significant bit of
every value, followed by the second most significant bit of every value, and so
on. Finally, the result is LZ4 compressed. Bitshuffle encoding is a good choice for
columns that have many repeated values, or values that change by small amounts
when sorted by primary key. The
<a href="https://github.com/kiyo-masui/bitshuffle">bitshuffle</a> project has a good
overview of performance and use cases.</p>
</dd>
</dl>
</div>
<div id="run-length" class="dlist">
<dl>
<dt class="hdlist1">Run Length Encoding</dt>
<dd>
<p><em>Runs</em> (consecutive repeated values) are compressed in a
column by storing only the value and the count. Run length encoding is effective
for columns with many consecutive repeated values when sorted by primary key.</p>
</dd>
</dl>
</div>
<div id="dictionary" class="dlist">
<dl>
<dt class="hdlist1">Dictionary Encoding</dt>
<dd>
<p>A dictionary of unique values is built, and each column value
is encoded as its corresponding index in the dictionary. Dictionary encoding
is effective for columns with low cardinality. If the column values of a given row set
are unable to be compressed because the number of unique values is too high, Kudu will
transparently fall back to plain encoding for that row set. This is evaluated during
flush.</p>
</dd>
</dl>
</div>
<div id="prefix" class="dlist">
<dl>
<dt class="hdlist1">Prefix Encoding</dt>
<dd>
<p>Common prefixes are compressed in consecutive column values. Prefix
encoding can be effective for values that share common prefixes, or the first
column of the primary key, since rows are sorted by primary key within tablets.</p>
</dd>
</dl>
</div>
</div>
<div class="sect2">
<h3 id="compression"><a class="link" href="#compression">Column Compression</a></h3>
<div class="paragraph">
<p>Kudu allows per-column compression using LZ4, <code>snappy</code>, or <code>zlib</code> compression
codecs. By default, columns are stored uncompressed. Consider using compression
if reducing storage space is more important than raw scan performance.</p>
</div>
<div class="paragraph">
<p>Every data set will compress differently, but in general LZ4 has the least effect on
performance, while <code>zlib</code> will compress to the smallest data sizes.
Bitshuffle-encoded columns are inherently compressed using LZ4, so it is not
typically beneficial to apply additional compression on top of this encoding.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="known-limitations"><a class="link" href="#known-limitations">Known Limitations</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu currently has some known limitations that may factor into schema design. When
designing your schema, consider these limitations together, not in isolation. If you
test these limitations and your findings are different from these, please share your
test cases and results.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">Number of Columns</dt>
<dd>
<p>Kudu has not been thoroughly tested with more than 200 columns
and we recommend schemas with fewer than 50 columns per table.</p>
</dd>
<dt class="hdlist1">Size of Rows</dt>
<dd>
<p>Kudu has not been thoroughly tested with rows larger than 10 kb. Most
testing has been on rows at 1 kb.</p>
</dd>
<dt class="hdlist1">Size of Cells</dt>
<dd>
<p>There is no hard limit imposed by Kudu, but large values (10s of
kilobytes and above) are likely to perform poorly and may cause stability issues
in current Kudu releases.</p>
</dd>
<dt class="hdlist1">Immutable Primary Keys</dt>
<dd>
<p>Kudu does not allow you to update the primary key of a
row after insertion.</p>
</dd>
<dt class="hdlist1">Non-alterable Primary Key</dt>
<dd>
<p>Kudu does not allow you to alter the primary key
columns after table creation.</p>
</dd>
<dt class="hdlist1">Non-alterable Partition Schema</dt>
<dd>
<p>Kudu does not allow you to alter the
partition schema after table creation.</p>
</dd>
<dt class="hdlist1">Partition Pruning</dt>
<dd>
<p>When tables use hash buckets, the Java client does not yet
use scan predicates to prune tablets for scans over these tables. In the future,
specifying an equality predicate on all columns in the hash bucket component
will limit the scan to only the tablets corresponding to the hash bucket.</p>
</dd>
<dt class="hdlist1">Tablet Splitting</dt>
<dd>
<p>You currently cannot split or merge tablets after table
creation. You must create the appropriate number of tablets in the
partition schema at table creation. As a workaround, you can copy the contents
of one table to another by using a <code>CREATE TABLE AS SELECT</code> statement or creating
an empty table and using an <code>INSERT</code> query with <code>SELECT</code> in the predicate to
populate the new table.</p>
</dd>
</dl>
</div>
</div>
</div>
</div>
<div class="col-md-3">
<div id="toc" data-spy="affix" data-offset-top="70">
<ul>
<li>
<a href="index.html">Introducing Kudu</a>
</li>
<li>
<a href="release_notes.html">Kudu Release Notes</a>
</li>
<li>
<a href="quickstart.html">Getting Started with Kudu</a>
</li>
<li>
<a href="installation.html">Installation Guide</a>
</li>
<li>
<a href="configuration.html">Configuring Kudu</a>
</li>
<li>
<a href="kudu_impala_integration.html">Using Impala with Kudu</a>
</li>
<li>
<a href="administration.html">Administering Kudu</a>
</li>
<li>
<a href="troubleshooting.html">Troubleshooting Kudu</a>
</li>
<li>
<a href="developing.html">Developing Applications with Kudu</a>
</li>
<li>
<span class="active-toc">Kudu Schema Design</span>
<ul class="sectlevel1">
<li><a href="#_the_perfect_schema">The Perfect Schema</a></li>
<li><a href="#primary-keys">Primary Keys</a></li>
<li><a href="#data-distribution">Data Distribution</a>
<ul class="sectlevel2">
<li><a href="#range-partitioning">Range Partitioning</a>
<ul class="sectlevel3">
<li><a href="#range-partition-management">Range Partition Management</a></li>
</ul>
</li>
<li><a href="#hash-bucketing">Hash Bucketing</a></li>
<li><a href="#hash-and-range">Hash Bucketing and Range Partitioning</a></li>
</ul>
</li>
<li><a href="#alter-schema">Schema Alterations</a></li>
<li><a href="#column-design">Column Design</a>
<ul class="sectlevel2">
<li><a href="#encoding">Column Encoding</a></li>
<li><a href="#compression">Column Compression</a></li>
</ul>
</li>
<li><a href="#known-limitations">Known Limitations</a></li>
</ul>
</li>
<li>
<a href="transaction_semantics.html">Kudu Transaction Semantics</a>
</li>
<li>
<a href="contributing.html">Contributing to Kudu</a>
</li>
<li>
<a href="style_guide.html">Kudu Documentation Style Guide</a>
</li>
<li>
<a href="configuration_reference.html">Kudu Configuration Reference</a>
</li>
</ul>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2019 The Apache Software Foundation. Last updated 2016-09-10 16:35:27 PDT
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>