blob: 8a9ef308f92fcf14ff19619409f5e9b5b6a45671 [file] [log] [blame]
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2017-04-19T10:10:47-07:00</updated><id>/</id><entry><title>Apache Kudu 1.3.1 released</title><link href="/2017/04/19/apache-kudu-1-3-1-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.1 released" /><published>2017-04-19T00:00:00-07:00</published><updated>2017-04-19T00:00:00-07:00</updated><id>/2017/04/19/apache-kudu-1-3-1-released</id><content type="html" xml:base="/2017/04/19/apache-kudu-1-3-1-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.1!&lt;/p&gt;
&lt;p&gt;Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
incorrectly deleted after certain sequences of node failures. Several other
bugs are also fixed. See the release notes for details.&lt;/p&gt;
&lt;p&gt;Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.1/&quot;&gt;Kudu 1.3.1 source release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.1!
Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
incorrectly deleted after certain sequences of node failures. Several other
bugs are also fixed. See the release notes for details.
Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.
Download the Kudu 1.3.1 source release
Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.</summary></entry><entry><title>Apache Kudu 1.3.0 released</title><link href="/2017/03/20/apache-kudu-1-3-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.0 released" /><published>2017-03-20T00:00:00-07:00</published><updated>2017-03-20T00:00:00-07:00</updated><id>/2017/03/20/apache-kudu-1-3-0-released</id><content type="html" xml:base="/2017/03/20/apache-kudu-1-3-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.0!&lt;/p&gt;
&lt;p&gt;Apache Kudu 1.3 is a minor release which adds various new features,
improvements, bug fixes, and optimizations on top of Kudu
1.2. Highlights include:&lt;/p&gt;
&lt;!--more--&gt;
&lt;ul&gt;
&lt;li&gt;significantly improved support for security, including Kerberos
authentication, TLS encryption, and coarse-grained (cluster-level)
authorization&lt;/li&gt;
&lt;li&gt;automatic garbage collection of historical versions of data&lt;/li&gt;
&lt;li&gt;lower space consumption and better performance in default
configurations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
&lt;a href=&quot;/releases/1.3.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
for an expanded list of important improvements, bug fixes, and
incompatible changes before upgrading.&lt;/p&gt;
&lt;p&gt;Thanks to the 25 developers who contributed code or documentation to
this release!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.0/&quot;&gt;Kudu 1.3.0 source release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.0!
Apache Kudu 1.3 is a minor release which adds various new features,
improvements, bug fixes, and optimizations on top of Kudu
1.2. Highlights include:</summary></entry><entry><title>Apache Kudu 1.2.0 released</title><link href="/2017/01/20/apache-kudu-1-2-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.2.0 released" /><published>2017-01-20T00:00:00-08:00</published><updated>2017-01-20T00:00:00-08:00</updated><id>/2017/01/20/apache-kudu-1-2-0-released</id><content type="html" xml:base="/2017/01/20/apache-kudu-1-2-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.2.0!&lt;/p&gt;
&lt;p&gt;The new release adds several new features and improvements, including:&lt;/p&gt;
&lt;!--more--&gt;
&lt;ul&gt;
&lt;li&gt;User data such as row contents is now redacted from logging statements.&lt;/li&gt;
&lt;li&gt;Kudu’s ability to provide strong consistency guarantees has been substantially improved.&lt;/li&gt;
&lt;li&gt;Various performance improvements in metadata management as well as optimizations for BITSHUFFLE encoding on AVX2-capable hosts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, 1.2.0 fixes a number of important bugs, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kudu now automatically limits its usage of file descriptors, preventing crashes due to ulimit exhaustion.&lt;/li&gt;
&lt;li&gt;Fixed a long-standing issue which could cause ext4 file system corruption on RHEL 6.&lt;/li&gt;
&lt;li&gt;Fixed a disk space leak.&lt;/li&gt;
&lt;li&gt;Several fixes for correctness in various edge cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
&lt;a href=&quot;/releases/1.2.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
for an expanded list of important improvements, bug fixes, and
incompatible changes before upgrading.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Download the &lt;a href=&quot;/releases/1.2.0/&quot;&gt;Kudu 1.2.0 source release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.2.0!
The new release adds several new features and improvements, including:</summary></entry><entry><title>Apache Kudu Weekly Update November 15th, 2016</title><link href="/2016/11/15/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 15th, 2016" /><published>2016-11-15T00:00:00-08:00</published><updated>2016-11-15T00:00:00-08:00</updated><id>/2016/11/15/weekly-update</id><content type="html" xml:base="/2016/11/15/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The first release candidate for Kudu 1.1.0 is &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCADY20s7ZKZkPmUEcTexW%3D%2B_%2BLnDY2hABZg0-UZD3jvWAs9-pog%40mail.gmail.com%3E&quot;&gt;now available&lt;/a&gt;.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Noteworthy new features/improvements:&lt;/dt&gt;
&lt;dd&gt;
&lt;ul&gt;
&lt;li&gt;The Python client has been brought to feature parity with the C++ and Java clients.&lt;/li&gt;
&lt;li&gt;IN LIST predicates.&lt;/li&gt;
&lt;li&gt;Java client now features client-side tracing.&lt;/li&gt;
&lt;li&gt;Kudu now publishes jar files for Spark 2.0 compiled with Scala 2.11.&lt;/li&gt;
&lt;li&gt;Kudu’s Raft implementation now features pre-elections. In our tests this has greatly improved stability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Community developers and users are encouraged to download the source
tarball and vote on the release.&lt;/p&gt;
&lt;p&gt;For more information on what’s new, check out the
&lt;a href=&quot;https://github.com/apache/kudu/blob/branch-1.1.x/docs/release_notes.adoc&quot;&gt;release notes&lt;/a&gt;.
&lt;em&gt;Note:&lt;/em&gt; some links from these in-progress release notes will not be live until the
release itself is published.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On November 7th, the Kudu PMC announced that Jordan Birdsell, from State Farm, had been voted
in as a new committer and PMC member.&lt;/p&gt;
&lt;p&gt;Jordan’s contributions include extensive work on the python client, throwing it some much needed
love, and bringing it to feature parity with the other clients.&lt;/p&gt;
&lt;p&gt;Besides his extensive code contributions Jordan has also been active in reviewing other
developer’s patches and helping the community in general, on slack and other channels.&lt;/p&gt;
&lt;p&gt;Jordan has been doing great work and the Kudu PMC was pleased to recognize his contributions
with committership.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mike Percy will be presenting Kudu Wednesday 16th November at &lt;a href=&quot;https://apachebigdataeu2016.sched.org/&quot;&gt;Apache Big Data Europe, in Seville&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Congratulations to Haijie Hong for his &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;first contribution to Kudu!&lt;/a&gt;.
Haijie fixed some edge cases in BitWriter that were blocking RLE usage for 64 bit types.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Congratulations to Maxim Smyatkin for his &lt;a href=&quot;https://gerrit.cloudera.org/#/q/Maxim&quot;&gt;first contributions to Kudu!&lt;/a&gt;.
Maxim has contributed several patches helping with debug and cleanup.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A lot of progress has been done towards the goals that were set in the scope docs introduced in
the last couple of posts. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dan Burkert, Todd Lipcon and Alexey Serbin have doubled down on the security effort. They have
been working on enabling Kerberos authentication and rpc encryption. The &lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;security scope doc&lt;/a&gt;
has been updated with the latest plans for security and many patches have been merged already.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;David Alves has continued the work on &lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;consistency&lt;/a&gt;. Up for review
and partially pushed is a patch series to address row history loss if a row is deleted and then
re-inserted. Also in progress is work to make sure that scans at a snapshot from followers
always return same data as if they were executed on the leader. This helps with Read-Your-Writes
when reading from lagging replicas.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Adar Dembo has been making good progress &lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;addressing issues seen with the LogBlockManager&lt;/a&gt;.
A series of patches have been merged with various fixes to block managers in general and to the
log block manager in particular.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dinesh Bhat has been working on improving the manual recovery tools for Kudu. Namely, he has
added a tool to force a remote replica copy to a destination server, and a tool to delete a
local replica of a tablet. The latter is useful when a tablet cannot come up due to bad state.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Jean-Daniel Cryans has implemented RPC tracing for the java client, greatly improving
debuggability. JD also has added ReplicaSelection to the java client, allowing to perform
scans on replicas other than the leader, which should be of great help for load-balancing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Besides the feature parity contributions, Jordan Birdsell has laid out a
&lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCAGaaj_VKfB4mhu6eExHCWo0%3D6Qd0HFWy7bg9e39JgOaFPGJ1nQ%40mail.gmail.com%3E&quot;&gt;roadmap for Python client work&lt;/a&gt;
for the 1.2 release. Feedback from other Python client users is certainly appreciated.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>David Alves</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update November 1st, 2016</title><link href="/2016/11/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 1st, 2016" /><published>2016-11-01T00:00:00-07:00</published><updated>2016-11-01T00:00:00-07:00</updated><id>/2016/11/01/weekly-update</id><content type="html" xml:base="/2016/11/01/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dan Burkert committed a piece of test infrastructure
called “MiniKDC” for both Java and C++. The MiniKDC sets up a short-lived
Kerberos environment in the context of a single test case, making it
easy to build tests of security features without requiring any special
infrastructure on the part of the developer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Todd Lipcon added support for Kerberos (GSSAPI) support to Kudu’s
RPC system, allowing servers to authenticate the user principal of
any inbound RPC connection. He also integrated Kudu’s C++ “MiniCluster”
test infrastructure to allow starting a Kerberized cluster in the
context of a test.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dan, Todd, and Alexey Serbin have been iterating on a more detailed
&lt;a href=&quot;https://docs.google.com/document/d/1Yu4iuIhaERwug1vS95yWDd_WzrNRIKvvVGUb31y-_mY/edit#&quot;&gt;design doc&lt;/a&gt;
for authentication in Kudu. This doc outlines the various non-Kerberos
methods that Kudu will use for authentication as well as how TLS will
be used to encrypt and authenticate some types of connections.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Part of the above design document involves Kudu servers generating and
signing X509 certificates on the fly to use for authenticated TLS.
Alexey has been working on a large &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4799/&quot;&gt;patch&lt;/a&gt;
which uses OpenSSL to provide key generation and signing functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Sailesh Mukil has been working on adding support for
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/4789/&quot;&gt;TLS in Kudu’s RPC system&lt;/a&gt;. The TLS
support is a critical part of the overall design for security. This patch
has gone through several rounds of review and nearing completion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;JD Cryans has been continuing to improve the Java client, including adding
the ability to specify that the client would like to read the “closest”
replica (e.g. reading from a local copy if possible). Additionally,
JD has been working on some basic &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4781/&quot;&gt;tracing support&lt;/a&gt;
within the Java client. This tracing aims to make timeouts easier to understand
and diagnose.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Jordan Birdsell committed 9 more patches to the Python client, bringing it
very close to feature parity with C++. Jordan has a few more patches in flight
which should complete this long-running effort.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Congrats to new contributor Haijie Hong who committed his first patch this week.
Haijie added support for &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;run-length encoding 64-bit integers&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Will Berkeley picked back up work on &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4310/&quot;&gt;improving the capability of ALTER
TABLE&lt;/a&gt;. His in-flight patch adds support
for changing the default value of a column as well as changing storage attributes
such as desired block size, encoding, and compression.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Adar Dembo has been working on a series of patches for the Block Manager, the
component of Kudu which is responsible for laying out blocks on the local
file system. His patch series consists of a number of refactors to clean up
and improve the code structure, followed by an &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4848/&quot;&gt;improvement to reduce file system
fragmentation&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;David Alves has been working on a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4819/&quot;&gt;patch series&lt;/a&gt;
which adds support for storing ‘REINSERT’ deltas on disk. These records are
generated if a user inserts a row, deletes it, and inserts a new row with the
same primary key. Current versions of Kudu lose track of the history of the
prior version of the row in this scenario, which prevents correct snapshot reads.
David’s patch series fixes this.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 20th, 2016</title><link href="/2016/10/20/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 20th, 2016" /><published>2016-10-20T00:00:00-07:00</published><updated>2016-10-20T00:00:00-07:00</updated><id>/2016/10/20/weekly-update</id><content type="html" xml:base="/2016/10/20/weekly-update.html">&lt;p&gt;Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Kudu 1.0.1 was &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-user/201610.mbox/%3CCALo2W-UgTa%2BX15_q_9FQpRUPWN53eyqFS10C5MXK1KpsFgqcyQ%40mail.gmail.com%3E&quot;&gt;released&lt;/a&gt;
on October 11th. This is a bug fix release which fixes several bugs found
in 1.0.0. See the &lt;a href=&quot;http://kudu.apache.org/releases/1.0.1/docs/release_notes.html&quot;&gt;Kudu 1.0.1 release notes&lt;/a&gt;
for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Todd Lipcon has proposed a &lt;a href=&quot;https://lists.apache.org/thread.html/4c94d313e28381bb107682ffaf43adfd38bd7fb3b03c98e3c86c52e2@%3Cdev.kudu.apache.org%3E&quot;&gt;release plan&lt;/a&gt;
for the next few months. The proposal is to have a 1.1 release in mid-November and
a 1.2 release in mid-January. These would be time-based releases rather than
gated on any particular feature scope; however, it’s anticipated that several
new features and improvements will be ready in time for these releases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Happy fourth birthday to the Kudu project! The initial commit was made
on October 11th, 2012! Since then we’ve had 4888 more commits by 60
authors!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;As mentioned last week, a lot of contributors have been collaborating on
design documents for upcoming work. Here’s the complete list of in-flight
documents, along with the primary authors of these docs:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;Security features&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://goo.gl/wP5BJb&quot;&gt;Improved disk-failure handling&lt;/a&gt; (Dinesh Bhat)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/7K48&quot;&gt;Tools for manual recovery from corruption&lt;/a&gt; (Mike Percy and Dinesh Bhat)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;Addressing issues seen with the LogBlockManager&lt;/a&gt; (Adar Dembo)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;Providing proper snapshot/serializable consistency&lt;/a&gt; (David Alves)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/ARUP&quot;&gt;Improving re-replication of under-replicated tablets&lt;/a&gt; (Mike Percy)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit&quot;&gt;Avoiding Raft election storms&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/kudu-backup-scope&quot;&gt;Backup and bulk load&lt;/a&gt; (Dan Burkert)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/SM6V&quot;&gt;Improving diagnosability of client errors&lt;/a&gt; (Alexey Serbin)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In many cases, work is now progressing on implementation of these ideas,
but these are considered living documents. It’s not too late to add your
comments or volunteer to help out.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;JD Cryans has been working on cleaning up the Java client. Several complex pieces
of code were completely removed, and other parts were refactored into new
standalone classes for better modularity. Along the way, JD also
&lt;a href=&quot;http://gerrit.cloudera.org:8080/4706&quot;&gt;reduced lock contention&lt;/a&gt; on a frequently-accessed
data structure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Todd Lipcon implemented and committed Raft “pre-elections” as described in the
[election storm mitigation design document]((https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit).
Initial experiments, detailed in the document, indicate that this will substantially
improve leader stability on clusters with overloaded disks and lots of tablets.&lt;/p&gt;
&lt;p&gt;Following this patch, Todd worked on some cleanup and refactor of the Consensus
implementation, removing a bunch of dead code and splitting some classes up
into smaller pieces. This is preparing for some improvements in locking
granularity also described in the same document.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dan Burkert and Todd Lipcon have started submitting patches to integrate Kerberos
authentication with Kudu’s RPC system. Dan posted a
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/4752/&quot;&gt;patch&lt;/a&gt; which adds “MiniKDC”, some test
infrastructure for starting and stopping a standalone Kerberos service in
the context of a test. Todd worked on adding
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/4763/&quot;&gt;support for Kerberos authentication&lt;/a&gt;
during RPC negotiation.&lt;/p&gt;
&lt;p&gt;These patches are just the beginning of the security work, but form an important
base to build on top of. The design uses Kerberos both as a mechanism to authenticate
clients as well as a way to mutually authenticate tablet servers with the master.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 11th, 2016</title><link href="/2016/10/11/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 11th, 2016" /><published>2016-10-11T00:00:00-07:00</published><updated>2016-10-11T00:00:00-07:00</updated><id>/2016/10/11/weekly-update</id><content type="html" xml:base="/2016/10/11/weekly-update.html">&lt;p&gt;Welcome to the twenty-first edition of the Kudu Weekly Update. Astute
readers will notice that the weekly blog posts have been not-so-weekly
of late – in fact, it has been nearly two months since the previous post
as I and others have focused on releases, conferences, etc.&lt;/p&gt;
&lt;p&gt;So, rather than covering just this past week, this post will cover highlights
of the progress since the 1.0 release in mid-September. If you’re interested
in learning about progress prior to that release, check the
&lt;a href=&quot;http://kudu.apache.org/releases/1.0.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;On September 12th, the Kudu PMC announced that Alexey Serbin and Will
Berkeley had been voted as new committers and PMC members.&lt;/p&gt;
&lt;p&gt;Alexey’s contributions prior to committership included
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/3952/&quot;&gt;AUTO_FLUSH_BACKGROUND&lt;/a&gt; support
in C++ as well as &lt;a href=&quot;http://kudu.apache.org/apidocs/&quot;&gt;API documentation&lt;/a&gt;
for the C++ client API.&lt;/p&gt;
&lt;p&gt;Will’s contributions include several fixes to the web UIs, large
improvements the Flume integration, and a lot of good work
burning down long-standing bugs.&lt;/p&gt;
&lt;p&gt;Both contributors were “acting the part” and the PMC was pleased to
recognize their contributions with committership.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Kudu 1.0.0 was &lt;a href=&quot;https://kudu.apache.org/2016/09/20/apache-kudu-1-0-0-released.html&quot;&gt;released&lt;/a&gt;
on September 19th. Most community members have upgraded by this point
and have been reporting improved stability and performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dan Burkert has been managing a Kudu 1.0.1 release to address a few
important bugs discovered since 1.0.0. The vote passed on Monday
afternoon, so the release should be made officially available
later this week.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After the 1.0 release, many contributors have gone into a design phase
for upcoming work. Over the last couple of weeks, developers have posted
scoping and design documents for topics including:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;Security features&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://goo.gl/wP5BJb&quot;&gt;Improved disk-failure handling&lt;/a&gt; (Dinesh Bhat)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/7K48&quot;&gt;Tools for manual recovery from corruption&lt;/a&gt; (Mike Percy and Dinesh Bhat)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;Addressing issues seen with the LogBlockManager&lt;/a&gt; (Adar Dembo)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;Providing proper snapshot/serializable consistency&lt;/a&gt; (David Alves)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://s.apache.org/ARUP&quot;&gt;Improving re-replication of under-replicated tablets&lt;/a&gt; (Mike Percy)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit&quot;&gt;Avoiding Raft election storms&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The development community has no particular rule that all work must be
accompanied by such a document, but in the past they have proven useful
for fleshing out ideas around a design before beginning implementation.
As Kudu matures, we can probably expect to see more of this kind of planning
and design discussion.&lt;/p&gt;
&lt;p&gt;If any of the above work areas sounds interesting to you, please take a
look and leave your comments! Similarly, if you are interested in contributing
in any of these areas, please feel free to volunteer on the mailing list.
Help of all kinds (coding, documentation, testing, etc) is welcomed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Adar Dembo spent a chunk of time re-working the &lt;code&gt;thirdparty&lt;/code&gt; directory
that contains most of Kudu’s native dependencies. The major resulting
changes are:
&lt;ul&gt;
&lt;li&gt;Build directories are now cleanly isolated from source directories,
improving cleanliness of re-builds.&lt;/li&gt;
&lt;li&gt;ThreadSanitizer (TSAN) builds now use &lt;code&gt;libc++&lt;/code&gt; instead of &lt;code&gt;libstdcxx&lt;/code&gt;
for C++ library support. The &lt;code&gt;libc++&lt;/code&gt; library has better support for
sanitizers, is easier to build in isolation, and solves some compatibility
issues that Adar was facing with GCC 5 on Ubuntu Xenial.&lt;/li&gt;
&lt;li&gt;All of the thirdparty dependencies now build with TSAN instrumentation,
which improves our coverage of this very effective tooling.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The impact to most developers is that, if you have an old source checkout,
it’s highly likely you will need to clean and re-build the thirdparty
directory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Many contributors spent time in recent weeks trying to address the
flakiness of various test cases. The Kudu project uses a
&lt;a href=&quot;http://dist-test.cloudera.org:8080/&quot;&gt;dashboard&lt;/a&gt; to track the flakiness
of each test case, and &lt;a href=&quot;http://dist-test.cloudera.org/&quot;&gt;distributed test infrastructure&lt;/a&gt;
to facilitate reproducing test flakes. &lt;!-- spaces cause line break --&gt;
As might be expected, some of the flaky tests were due to bugs or
timing assumptions in the tests themselves. However, this effort
also identified several real bugs:
&lt;ul&gt;
&lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4570]&quot;&gt;tight retry loop&lt;/a&gt; in the
Java client.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4395&quot;&gt;memory leak&lt;/a&gt; due to circular
references in the C++ client.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4551&quot;&gt;crash&lt;/a&gt; which could affect
tools used for problem diagnosis.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4409&quot;&gt;divergence bug&lt;/a&gt; in Raft consensus
under particularly torturous scenarios.&lt;/li&gt;
&lt;li&gt;A potential &lt;a href=&quot;http://gerrit.cloudera.org:8080/4394&quot;&gt;crash during tablet server startup&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A case in which &lt;a href=&quot;http://gerrit.cloudera.org:8080/4626&quot;&gt;thread startup could be delayed&lt;/a&gt;
by built-in monitoring code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a result of these efforts, the failure rate of these flaky tests has
decreased significantly and the stability of Kudu releases continues
to increase.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dan Burkert picked up work originally started by Sameer Abhyankar on
&lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1363&quot;&gt;KUDU-1363&lt;/a&gt;, which adds
support for adding &lt;code&gt;IN (...)&lt;/code&gt; predicates to scanners. Dan committed the
&lt;a href=&quot;http://gerrit.cloudera.org:8080/2986&quot;&gt;main patch&lt;/a&gt; as well as corresponding
&lt;a href=&quot;http://gerrit.cloudera.org:8080/4530&quot;&gt;support in the Java client&lt;/a&gt;.
Jordan Birdsell quickly added corresponding support in &lt;a href=&quot;http://gerrit.cloudera.org:8080/4548&quot;&gt;Python&lt;/a&gt;.
This new feature will be available in an upcoming release.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Work continues on the &lt;code&gt;kudu&lt;/code&gt; command line tool. Dinesh Bhat added
the ability to ask a tablet’s leader to &lt;a href=&quot;http://gerrit.cloudera.org:8080/4533&quot;&gt;step down&lt;/a&gt;
and Alexey Serbin added a &lt;a href=&quot;http://gerrit.cloudera.org:8080/4412&quot;&gt;tool to insert random data into a
table&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Jordan Birdsell continues to be on a tear improving the Python client.
The patches are too numerous to mention, but highlights include Python 3
support as well as near feature parity with the C++ client.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Todd Lipcon has been doing some refactoring and cleanup in the Raft
consensus implementation. In addition to simplifying and removing code,
he committed &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1567&quot;&gt;KUDU-1567&lt;/a&gt;,
which improves write performance in many cases by a factor of three
or more while also improving stability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Brock Noland is working on support for &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4491/&quot;&gt;INSERT IGNORE&lt;/a&gt;
as a first-class part of the Kudu API. Of course this functionality
can already be done by simply performing normal inserts and ignoring any
resulting errors, but pushing it to the server prevents the server
from counting such operations as errors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Congratulations to Ninad Shringarpure for contributing his first patches
to Kudu. Ninad contributed two documentation fixes and improved
formatting on the Kudu web UI.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-first edition of the Kudu Weekly Update. Astute
readers will notice that the weekly blog posts have been not-so-weekly
of late &amp;#8211; in fact, it has been nearly two months since the previous post
as I and others have focused on releases, conferences, etc.
So, rather than covering just this past week, this post will cover highlights
of the progress since the 1.0 release in mid-September. If you&amp;#8217;re interested
in learning about progress prior to that release, check the
release notes.</summary></entry><entry><title>Apache Kudu at Strata+Hadoop World NYC 2016</title><link href="/2016/09/26/strata-nyc-kudu-talks.html" rel="alternate" type="text/html" title="Apache Kudu at Strata+Hadoop World NYC 2016" /><published>2016-09-26T00:00:00-07:00</published><updated>2016-09-26T00:00:00-07:00</updated><id>/2016/09/26/strata-nyc-kudu-talks</id><content type="html" xml:base="/2016/09/26/strata-nyc-kudu-talks.html">&lt;p&gt;This week in New York, O’Reilly and Cloudera will be hosting Strata+Hadoop World
2016. If you’re interested in Kudu, there will be several opportunities to
learn more, both from the open source development team as well as some companies
who are already adopting Kudu for their use cases.
&lt;!--more--&gt;
Here are some of the sessions to check out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52146&quot;&gt;Powering real-time analytics on Xfinity using Kudu&lt;/a&gt; (Wednesday, 11:20am)&lt;/p&gt;
&lt;p&gt;Sridhar Alla and Kiran Muglurmath from Comcast will talk about how they’re using
Kudu to store hundreds of billions of Set-Top Box (STB) events, performing
analytics concurrently with real-time streaming ingest of thousands of events
per second.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52248&quot;&gt;Creating real-time, data-centric applications with Impala and Kudu&lt;/a&gt; (Wednesday, 2:05pm)&lt;/p&gt;
&lt;p&gt;Marcel Kornacker and Todd Lipcon will introduce how Impala and Kudu together
allow users to build real-time applications that support streaming ingest,
random access updates and deletes, and high performance analytic SQL in
a single system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52168&quot;&gt;Streaming cybersecurity into Graph: Accelerating data into Datastax Graph and Blazegraph&lt;/a&gt; (Thursday, 1:15pm)&lt;/p&gt;
&lt;p&gt;Joshua Patterson, Michael Wendt, and Keith Kraus from Accenture Labs will discuss
how they have built cybersecurity solutions using graph analytics on top of open
source technology like Apache Kafka, Spark, and Flink. They will also touch on
why Kudu is becoming an integral part of Accenture’s technology stack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52050&quot;&gt;How GE analyzes billions of mission-critical events in real time using Apache Apex, Spark, and Kudu&lt;/a&gt; (Thursday, 2:05pm)&lt;/p&gt;
&lt;p&gt;Venkatesh Sivasubramanian and Luis Ramos from GE Digital will discuss how they
collect and process real-time IoT data using Apache Apex and Apache Spark, and
how they’ve been experimenting with Apache Kudu for time series data storage.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/51887&quot;&gt;Apache Kudu: 1.0 and Beyond&lt;/a&gt; (Thursday, 4:35pm)&lt;/p&gt;
&lt;p&gt;Todd Lipcon from Cloudera will review the new features that were developed between Kudu 0.5
(the first public release one year ago) and Kudu 1.0, released just last week. Additionally,
this talk will provide some insight into the upcoming project roadmap for the coming year.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Aside from these organized sessions, word has it that there will be various demos
featuring Apache Kudu at the Cloudera and ZoomData vendor booths.&lt;/p&gt;
&lt;p&gt;If you’re not attending the conference, but still based in NYC, all hope is
not lost. Michael Crutcher from Cloudera will be presenting an introduction
to Apache Kudu at the &lt;a href=&quot;http://www.meetup.com/mysqlnyc/events/233599664/&quot;&gt;SQL NYC Meetup&lt;/a&gt;.
Be sure to RSVP as spots are filling up fast.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>This week in New York, O&amp;#8217;Reilly and Cloudera will be hosting Strata+Hadoop World
2016. If you&amp;#8217;re interested in Kudu, there will be several opportunities to
learn more, both from the open source development team as well as some companies
who are already adopting Kudu for their use cases.</summary></entry><entry><title>Apache Kudu 1.0.0 released</title><link href="/2016/09/20/apache-kudu-1-0-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.0.0 released" /><published>2016-09-20T00:00:00-07:00</published><updated>2016-09-20T00:00:00-07:00</updated><id>/2016/09/20/apache-kudu-1-0-0-released</id><content type="html" xml:base="/2016/09/20/apache-kudu-1-0-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.0.0!&lt;/p&gt;
&lt;p&gt;This latest version adds several new features, including:&lt;/p&gt;
&lt;!--more--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Removal of multiversion concurrency control (MVCC) history is now supported.
This allows Kudu to reclaim disk space, where previously Kudu would keep a full
history of all changes made to a given table since the beginning of time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most of Kudu’s command line tools have been consolidated under a new
top-level &lt;code&gt;kudu&lt;/code&gt; tool. This reduces the number of large binaries distributed
with Kudu and also includes much-improved help output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Administrative tools including &lt;code&gt;kudu cluster ksck&lt;/code&gt; now support running
against multi-master Kudu clusters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The C++ client API now supports writing data in &lt;code&gt;AUTO_FLUSH_BACKGROUND&lt;/code&gt; mode.
This can provide higher throughput for ingest workloads.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This release also includes many bug fixes, optimizations, and other
improvements, detailed in the &lt;a href=&quot;/releases/1.0.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Download the &lt;a href=&quot;/releases/1.0.0/&quot;&gt;Kudu 1.0.0 source release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
This latest version adds several new features, including:</summary></entry><entry><title>Pushing Down Predicate Evaluation in Apache Kudu</title><link href="/2016/09/16/predicate-pushdown.html" rel="alternate" type="text/html" title="Pushing Down Predicate Evaluation in Apache Kudu" /><published>2016-09-16T00:00:00-07:00</published><updated>2016-09-16T00:00:00-07:00</updated><id>/2016/09/16/predicate-pushdown</id><content type="html" xml:base="/2016/09/16/predicate-pushdown.html">&lt;p&gt;I had the pleasure of interning with the Apache Kudu team at Cloudera this
summer. This project was my summer contribution to Kudu: a restructuring of the
scan path to speed up queries.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In Kudu, &lt;em&gt;predicate pushdown&lt;/em&gt; refers to the way in which predicates are
handled. When a scan is requested, its predicates are passed through the
different layers of Kudu’s storage hierarchy, allowing for pruning and other
optimizations to happen at each level before reaching the underlying data.&lt;/p&gt;
&lt;p&gt;While predicates are pushed down, predicate evaluation itself occurs at a fairly
high level, precluding the evaluation process from certain data-specific
optimizations. These optimizations can make tablet scans an order of magnitude
faster, if not more.&lt;/p&gt;
&lt;h2 id=&quot;a-day-in-the-life-of-a-query&quot;&gt;A Day in the Life of a Query&lt;/h2&gt;
&lt;p&gt;Because Kudu is a columnar storage engine, its scan path has a number of
optimizations to avoid extraneous reads, copies, and computation. When a query
is sent to a tablet server, the server prunes tablets based on the
primary key, directing the request to only the tablets that contain the key
range of interest. Once at a tablet, only the columns relevant to the query are
scanned. Further pruning is done over the primary key, and if the query is
predicated on non-key columns, the entire column is scanned. The columns in a
tablet are stored as &lt;em&gt;cfiles&lt;/em&gt;, which are split into encoded &lt;em&gt;blocks&lt;/em&gt;. Once the
relevant cfiles are determined, the data are materialized by the block
decoders, i.e. their underlying data are decoded and copied into a buffer,
which is passed back to the tablet layer. The tablet can then evaluate the
predicate on the batch of data and mark which rows should be returned to the
client.&lt;/p&gt;
&lt;p&gt;One of the encoding types I worked very closely with is &lt;em&gt;dictionary encoding&lt;/em&gt;,
an encoding type for strings that performs particularly well for cfiles that
have repeating values. Rather than storing every row’s string, each unique
string is assigned a numeric codeword, and the rows are stored numerically on
disk. When materializing a dictionary block, all of the numeric data are scanned
and all of the corresponding strings are copied and buffered for evaluation.
When the vocabulary of a dictionary-encoded cfile gets too large, the blocks
begin switching to &lt;em&gt;plain encoding mode&lt;/em&gt; to act like &lt;em&gt;plain-encoded&lt;/em&gt; blocks.&lt;/p&gt;
&lt;p&gt;In a plain-encoded block, strings are stored contiguously and the character
offsets to the start of each string are stored as a list of integers. When
materializing, all of the strings are copied to a buffer for evaluation.&lt;/p&gt;
&lt;p&gt;Therein lies room for improvement: this predicate evaluation path is the same
for all data types and encoding types. Within the tablet, the correct cfiles
are determined, the cfiles’ decoders are opened, all of the data are copied to
a buffer, and the predicates are evaluated on this buffered data via
type-specific comparators. This path is extremely flexible, but because it was
designed to be encoding-independent, there is room for improvement.&lt;/p&gt;
&lt;h2 id=&quot;trimming-the-fat&quot;&gt;Trimming the Fat&lt;/h2&gt;
&lt;p&gt;The first step is to allow the decoders access to the predicate. In doing so,
each encoding type can specialize its evaluation. Additionally, this puts the
decoder in a position where it can determine whether a given row satisfies the
query, which in turn, allows the decoders to determine what data gets copied
instead of eagerly copying all of its data to get evaluated.&lt;/p&gt;
&lt;p&gt;Take the case of dictionary-encoded strings as an example. With the existing
scan path, not only are all of the strings in a column copied into a buffer, but
string comparisons are done on every row. By taking advantage of the fact that
the data can be represented as integers, the cost of determining the query
results can be greatly reduced. The string comparisons can be swapped out with
evaluation based on the codewords, in which case the room for improvement boils
down to how to most quickly determine whether or not a given codeword
corresponds to a string that satisfies the predicate. Dictionary columns will
now use a bitset to store the codewords that match the predicates. It will then
scan through the integer-valued data and checks the bitset to determine whether
it should copy the corresponding string over.&lt;/p&gt;
&lt;p&gt;This is great in the best case scenario where a cfile’s vocabulary is small,
but when the vocabulary gets too large and the dictionary blocks switch to plain
encoding mode, performance is hampered. In this mode, the blocks don’t utilize
any dictionary metadata and end up wasting the codeword bitset. That isn’t to
say all is lost: the decoders can still evaluate a predicate via string
comparison, and the fact that evaluation can still occur at the decoder-level
means the eager buffering can still be avoided.&lt;/p&gt;
&lt;p&gt;Dictionary encoding is a perfect storm in that the decoders can completely
evaluate the predicates. This is not the case for most other encoding types,
but having decoders support evaluation leaves the door open for other encoding
types to extend this idea.&lt;/p&gt;
&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;
&lt;p&gt;Depending on the dataset and query, predicate pushdown can lead to significant
improvements. Tablet scans were timed with datasets consisting of repeated
string patterns of tunable length and tunable cardinality.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/predicate-pushdown/pushdown-10.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;
&lt;img src=&quot;/img/predicate-pushdown/pushdown-10M.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The above plots show the time taken to completely scan a single tablet, recorded
using a dataset of ten million rows of strings with length ten. Predicates were
designed to select values out of bounds (Empty), select a single value (Equal,
i.e. for cardinality &lt;em&gt;k&lt;/em&gt;, this would select 1/&lt;em&gt;k&lt;/em&gt; of the dataset), select half
of the full range (Half), and select the full range of values (All).&lt;/p&gt;
&lt;p&gt;With the original evaluation implementation, the tablet must copy and scan
through the tablet to determine whether any values match. This means that even
when the result set is small, the full column is still copied. This is avoided
by pushing down predicates, which only copies as needed, and can be seen in the
above queries: those with near-empty result sets (Empty and Equal) have shorter
scan times than those with larger result sets (Half and All).&lt;/p&gt;
&lt;p&gt;Note that for dictionary encoding, given a low cardinality, Kudu can completely
rely on the dictionary codewords to evaluate, making the query significantly
faster. At higher cardinalities, the dictionaries completely fill up and the
blocks fall back on plain encoding. The slower, albeit still improved,
performance on the dataset containing 10M unique values reflects this.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/predicate-pushdown/pushdown-tpch.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Similar predicates were run with the TPC-H dataset, querying on the shipdate
column. The full path of a query includes not only the tablet scanning itself,
but also RPCs and batched data transfer to the caller as the scan progresses.
As such, the times plotted above refer to the average end-to-end time required
to scan and return a batch of rows. Regardless of this additional overhead,
significant improvements on the scan path still yield substantial improvements
to the query performance as a whole.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Pushing down predicate evaluation in Kudu yielded substantial improvements to
the scan path. For dictionary encoding, pushdown can be particularly powerful,
and other encoding types are either unaffected or also improved. This change has
been pushed to the main branch of Kudu, and relevant commits can be found
&lt;a href=&quot;https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010&quot;&gt;here&lt;/a&gt;
and
&lt;a href=&quot;https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This summer has been a phenomenal learning experience for me, in terms of the
tools, the workflow, the datasets, the thought-processes that go into building
something at Kudu’s scale. I am extremely thankful for all of the mentoring and
support I received, and that I got to be a part of Kudu’s journey from
incubating to a Top Level Apache project. I can’t express enough how grateful I
am for the amount of support I got from the Kudu team, from the intern
coordinators, and from the Cloudera community as a whole.&lt;/p&gt;</content><author><name>Andrew Wong</name></author><summary>I had the pleasure of interning with the Apache Kudu team at Cloudera this
summer. This project was my summer contribution to Kudu: a restructuring of the
scan path to speed up queries.</summary></entry></feed>