A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data
<meta name="author" content="Cloudera" />
Apache Kudu (incubating) Weekly Update April 4, 2016
<h1 class="entry-title">Apache Kudu (incubating) Weekly Update April 4, 2016</h1>
<p class="meta">Posted 04 Apr 2016 by Todd Lipcon</p>
<div class="entry-content">
<p>Welcome to the third edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</p>
<p>If you find this post useful, please let us know by emailing the
<a href="">kudu-user mailing list</a> or
tweeting at <a href="">@ApacheKudu</a>. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.</p>
<h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2>
<p>The 0.8.0 release train is progressing nicely. Jean-Daniel Cryans posted a first
release candidate <a href="">VOTE thread</a>
on the dev mailing list. Please download the release candidate and give it a try.</p>
<p>Ara Ebrahimi’s implementation of a <a href="">Kudu sink for Apache Flume</a>
was completed this week. Mike Percy worked on some follow-up around documentation and a shaded
<em>jar-with-dependencies</em> for easier consumption.</p>
<p>Mike also worked on and committed an <a href="">improvement to the Java API</a>
so that error results for failed write operations are easier to handle. Previously,
clients had to perform string comparisons to know whether an insert failed due
to something like a duplicate key constraint rather than some other less expected
<p>After some early cluster testing of the new implementation of scanner predicates,
a few tricky bugs were discovered. The first was around proper handling of inequality
predicates on floating point columns. The second involved handling predicates like
‘my_int8_col &lt;= 127’: for non-nullable columns, this predicate is a tautology
and can be eliminated. However, for nullable columns, such a predicate is equivalent
to ‘my_int8_col IS NOT NULL’. In order to fix this, Dan Burkert added an internal
<a href="">implementation of ‘IS NOT NULL’</a>.</p>
<p>These bugs also exposed a few gaps in test coverage around predicate handling. Dan
added a couple thousand lines worth of new test coverage both from <a href="">C++</a>
and <a href="">Java</a>.</p>
<p>These exhaustive tests also identified a <a href="">gap in Kudu’s handling of NaN
float values</a>. The team
elected to leave this as a known issue for now, since usage of NaN is relatively
<p>Todd Lipcon fixed a <a href="">bug in Kudu’s implementation of Raft configuration
This bug could cause tablet replicas to become “stuck” after certain types of network
partitions. The fix will be included in the upcoming 0.8 release.</p>
<p>Mike Percy has been working on a <a href="">bug</a>
where tablet servers fail to start up after their write-ahead logs (WALs) have been
truncated. This can happen on certain types of machine crashes, or if the disks
fill up under a write workload.</p>
<p>Todd Lipcon spent time this week testing Kudu on a small cluster with a 3TB
TPC-H dataset. In particular, he was focusing on concurrent query workloads,
including scenarios with multiple read-only users in addition to combining
a query workload with a write workload. As a result, he identified a few
issues around Kudu’s handling of RPCs in overload conditions.</p>
<p>In order to improve the behavior on the server side, Todd changed the RPC
scheduling algorithm to use an <a href="">earliest-deadline-first</a>
policy. This has the effect of preventing query timeouts: a query which
is closest to experiencing a timeout will be scheduled with higher priority
over those which have plenty of time left.</p>
<p>In addition, this work identified a few bugs in the Kudu C++ client.
In particular, in the case where the server was overloaded, the client
would sometimes <a href="">incorrectly rewind to the start of the current
tablet</a> resulting in incorrect
results. In other cases, the client would end up in a <a href="">tight loop sending
RPCs to the master</a>. Fixes
for both of these issues will be in the upcoming 0.8 release.</p>
<h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2>
<li>Todd Lipcon will be presenting an introductory Kudu talk at <a href="">DataEngConf</a>
on Friday, April 8th.</li>
