feed.xml - kudu-site - Git at Google

 <?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2017-04-19T10:10:47-07:00</updated><id>/</id><entry><title>Apache Kudu 1.3.1 released</title><link href="/2017/04/19/apache-kudu-1-3-1-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.1 released" /><published>2017-04-19T00:00:00-07:00</published><updated>2017-04-19T00:00:00-07:00</updated><id>/2017/04/19/apache-kudu-1-3-1-released</id><content type="html" xml:base="/2017/04/19/apache-kudu-1-3-1-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.1!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
 in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
 incorrectly deleted after certain sequences of node failures. Several other
 bugs are also fixed. See the release notes for details.&lt;/p&gt;

 &lt;p&gt;Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.1/&quot;&gt;Kudu 1.3.1 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.1!

 Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
 in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
 incorrectly deleted after certain sequences of node failures. Several other
 bugs are also fixed. See the release notes for details.

 Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.


   Download the Kudu 1.3.1 source release
   Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.</summary></entry><entry><title>Apache Kudu 1.3.0 released</title><link href="/2017/03/20/apache-kudu-1-3-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.0 released" /><published>2017-03-20T00:00:00-07:00</published><updated>2017-03-20T00:00:00-07:00</updated><id>/2017/03/20/apache-kudu-1-3-0-released</id><content type="html" xml:base="/2017/03/20/apache-kudu-1-3-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.0!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.3 is a minor release which adds various new features,
 improvements, bug fixes, and optimizations on top of Kudu
 1.2. Highlights include:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;significantly improved support for security, including Kerberos
 authentication, TLS encryption, and coarse-grained (cluster-level)
 authorization&lt;/li&gt;
   &lt;li&gt;automatic garbage collection of historical versions of data&lt;/li&gt;
   &lt;li&gt;lower space consumption and better performance in default
 configurations.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.3.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;p&gt;Thanks to the 25 developers who contributed code or documentation to
 this release!&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.0/&quot;&gt;Kudu 1.3.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.0!

 Apache Kudu 1.3 is a minor release which adds various new features,
 improvements, bug fixes, and optimizations on top of Kudu
 1.2. Highlights include:</summary></entry><entry><title>Apache Kudu 1.2.0 released</title><link href="/2017/01/20/apache-kudu-1-2-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.2.0 released" /><published>2017-01-20T00:00:00-08:00</published><updated>2017-01-20T00:00:00-08:00</updated><id>/2017/01/20/apache-kudu-1-2-0-released</id><content type="html" xml:base="/2017/01/20/apache-kudu-1-2-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.2.0!&lt;/p&gt;

 &lt;p&gt;The new release adds several new features and improvements, including:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;User data such as row contents is now redacted from logging statements.&lt;/li&gt;
   &lt;li&gt;Kudu’s ability to provide strong consistency guarantees has been substantially improved.&lt;/li&gt;
   &lt;li&gt;Various performance improvements in metadata management as well as optimizations for BITSHUFFLE encoding on AVX2-capable hosts.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Additionally, 1.2.0 fixes a number of important bugs, including:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Kudu now automatically limits its usage of file descriptors, preventing crashes due to ulimit exhaustion.&lt;/li&gt;
   &lt;li&gt;Fixed a long-standing issue which could cause ext4 file system corruption on RHEL 6.&lt;/li&gt;
   &lt;li&gt;Fixed a disk space leak.&lt;/li&gt;
   &lt;li&gt;Several fixes for correctness in various edge cases.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.2.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.2.0/&quot;&gt;Kudu 1.2.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.2.0!

 The new release adds several new features and improvements, including:</summary></entry><entry><title>Apache Kudu Weekly Update November 15th, 2016</title><link href="/2016/11/15/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 15th, 2016" /><published>2016-11-15T00:00:00-08:00</published><updated>2016-11-15T00:00:00-08:00</updated><id>/2016/11/15/weekly-update</id><content type="html" xml:base="/2016/11/15/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;The first release candidate for Kudu 1.1.0 is &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCADY20s7ZKZkPmUEcTexW%3D%2B_%2BLnDY2hABZg0-UZD3jvWAs9-pog%40mail.gmail.com%3E&quot;&gt;now available&lt;/a&gt;.&lt;/p&gt;

     &lt;dl&gt;
       &lt;dt&gt;Noteworthy new features/improvements:&lt;/dt&gt;
       &lt;dd&gt;
         &lt;ul&gt;
           &lt;li&gt;The Python client has been brought to feature parity with the C++ and Java clients.&lt;/li&gt;
           &lt;li&gt;IN LIST predicates.&lt;/li&gt;
           &lt;li&gt;Java client now features client-side tracing.&lt;/li&gt;
           &lt;li&gt;Kudu now publishes jar files for Spark 2.0 compiled with Scala 2.11.&lt;/li&gt;
           &lt;li&gt;Kudu’s Raft implementation now features pre-elections. In our tests this has greatly improved stability.&lt;/li&gt;
         &lt;/ul&gt;
       &lt;/dd&gt;
     &lt;/dl&gt;

     &lt;p&gt;Community developers and users are encouraged to download the source
 tarball and vote on the release.&lt;/p&gt;

     &lt;p&gt;For more information on what’s new, check out the
 &lt;a href=&quot;https://github.com/apache/kudu/blob/branch-1.1.x/docs/release_notes.adoc&quot;&gt;release notes&lt;/a&gt;.
 &lt;em&gt;Note:&lt;/em&gt; some links from these in-progress release notes will not be live until the
 release itself is published.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;On November 7th, the Kudu PMC announced that Jordan Birdsell, from State Farm, had been voted
 in as a new committer and PMC member.&lt;/p&gt;

     &lt;p&gt;Jordan’s contributions include extensive work on the python client, throwing it some much needed
 love, and bringing it to feature parity with the other clients.&lt;/p&gt;

     &lt;p&gt;Besides his extensive code contributions Jordan has also been active in reviewing other
 developer’s patches and helping the community in general, on slack and other channels.&lt;/p&gt;

     &lt;p&gt;Jordan has been doing great work and the Kudu PMC was pleased to recognize his contributions
 with committership.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Mike Percy will be presenting Kudu Wednesday 16th November at &lt;a href=&quot;https://apachebigdataeu2016.sched.org/&quot;&gt;Apache Big Data Europe, in Seville&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congratulations to Haijie Hong for his &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;first contribution to Kudu!&lt;/a&gt;.
 Haijie fixed some edge cases in BitWriter that were blocking RLE usage for 64 bit types.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congratulations to Maxim Smyatkin for his &lt;a href=&quot;https://gerrit.cloudera.org/#/q/Maxim&quot;&gt;first contributions to Kudu!&lt;/a&gt;.
 Maxim has contributed several patches helping with debug and cleanup.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;A lot of progress has been done towards the goals that were set in the scope docs introduced in
 the last couple of posts. Specifically:&lt;/p&gt;

     &lt;ul&gt;
       &lt;li&gt;
         &lt;p&gt;Dan Burkert, Todd Lipcon and Alexey Serbin have doubled down on the security effort. They have
 been working on enabling Kerberos authentication and rpc encryption. The &lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;security scope doc&lt;/a&gt;
 has been updated with the latest plans for security and many patches have been merged already.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;David Alves has continued the work on &lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;consistency&lt;/a&gt;. Up for review
 and partially pushed is a patch series to address row history loss if a row is deleted and then
 re-inserted. Also in progress is work to make sure that scans at a snapshot from followers
 always return same data as if they were executed on the leader. This helps with Read-Your-Writes
 when reading from lagging replicas.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Adar Dembo has been making good progress &lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;addressing issues seen with the LogBlockManager&lt;/a&gt;.
 A series of patches have been merged with various fixes to block managers in general and to the
 log block manager in particular.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Dinesh Bhat has been working on improving the manual recovery tools for Kudu. Namely, he has
 added a tool to force a remote replica copy to a destination server, and a tool to delete a
 local replica of a tablet. The latter is useful when a tablet cannot come up due to bad state.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Jean-Daniel Cryans has implemented RPC tracing for the java client, greatly improving
 debuggability. JD also has added ReplicaSelection to the java client, allowing to perform
 scans on replicas other than the leader, which should be of great help for load-balancing.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Besides the feature parity contributions, Jordan Birdsell has laid out a
 &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCAGaaj_VKfB4mhu6eExHCWo0%3D6Qd0HFWy7bg9e39JgOaFPGJ1nQ%40mail.gmail.com%3E&quot;&gt;roadmap for Python client work&lt;/a&gt;
 for the 1.2 release. Feedback from other Python client users is certainly appreciated.&lt;/p&gt;
       &lt;/li&gt;
     &lt;/ul&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>David Alves</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update November 1st, 2016</title><link href="/2016/11/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 1st, 2016" /><published>2016-11-01T00:00:00-07:00</published><updated>2016-11-01T00:00:00-07:00</updated><id>/2016/11/01/weekly-update</id><content type="html" xml:base="/2016/11/01/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;Dan Burkert committed a piece of test infrastructure
 called “MiniKDC” for both Java and C++. The MiniKDC sets up a short-lived
 Kerberos environment in the context of a single test case, making it
 easy to build tests of security features without requiring any special
 infrastructure on the part of the developer.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Todd Lipcon added support for Kerberos (GSSAPI) support to Kudu’s
 RPC system, allowing servers to authenticate the user principal of
 any inbound RPC connection. He also integrated Kudu’s C++ “MiniCluster”
 test infrastructure to allow starting a Kerberized cluster in the
 context of a test.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Dan, Todd, and Alexey Serbin have been iterating on a more detailed
 &lt;a href=&quot;https://docs.google.com/document/d/1Yu4iuIhaERwug1vS95yWDd_WzrNRIKvvVGUb31y-_mY/edit#&quot;&gt;design doc&lt;/a&gt;
 for authentication in Kudu. This doc outlines the various non-Kerberos
 methods that Kudu will use for authentication as well as how TLS will
 be used to encrypt and authenticate some types of connections.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Part of the above design document involves Kudu servers generating and
 signing X509 certificates on the fly to use for authenticated TLS.
 Alexey has been working on a large &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4799/&quot;&gt;patch&lt;/a&gt;
 which uses OpenSSL to provide key generation and signing functionality.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Sailesh Mukil has been working on adding support for
 &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4789/&quot;&gt;TLS in Kudu’s RPC system&lt;/a&gt;. The TLS
 support is a critical part of the overall design for security. This patch
 has gone through several rounds of review and nearing completion.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;JD Cryans has been continuing to improve the Java client, including adding
 the ability to specify that the client would like to read the “closest”
 replica (e.g. reading from a local copy if possible). Additionally,
 JD has been working on some basic &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4781/&quot;&gt;tracing support&lt;/a&gt;
 within the Java client. This tracing aims to make timeouts easier to understand
 and diagnose.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Jordan Birdsell committed 9 more patches to the Python client, bringing it
 very close to feature parity with C++. Jordan has a few more patches in flight
 which should complete this long-running effort.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congrats to new contributor Haijie Hong who committed his first patch this week.
 Haijie added support for &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;run-length encoding 64-bit integers&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Will Berkeley picked back up work on &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4310/&quot;&gt;improving the capability of ALTER
 TABLE&lt;/a&gt;. His in-flight patch adds support
 for changing the default value of a column as well as changing storage attributes
 such as desired block size, encoding, and compression.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Adar Dembo has been working on a series of patches for the Block Manager, the
 component of Kudu which is responsible for laying out blocks on the local
 file system. His patch series consists of a number of refactors to clean up
 and improve the code structure, followed by an &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4848/&quot;&gt;improvement to reduce file system
 fragmentation&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;David Alves has been working on a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4819/&quot;&gt;patch series&lt;/a&gt;
 which adds support for storing ‘REINSERT’ deltas on disk. These records are
 generated if a user inserts a row, deletes it, and inserts a new row with the
 same primary key. Current versions of Kudu lose track of the history of the
 prior version of the row in this scenario, which prevents correct snapshot reads.
 David’s patch series fixes this.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 20th, 2016</title><link href="/2016/10/20/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 20th, 2016" /><published>2016-10-20T00:00:00-07:00</published><updated>2016-10-20T00:00:00-07:00</updated><id>/2016/10/20/weekly-update</id><content type="html" xml:base="/2016/10/20/weekly-update.html">&lt;p&gt;Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;Kudu 1.0.1 was &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-user/201610.mbox/%3CCALo2W-UgTa%2BX15_q_9FQpRUPWN53eyqFS10C5MXK1KpsFgqcyQ%40mail.gmail.com%3E&quot;&gt;released&lt;/a&gt;
 on October 11th. This is a bug fix release which fixes several bugs found
 in 1.0.0. See the &lt;a href=&quot;http://kudu.apache.org/releases/1.0.1/docs/release_notes.html&quot;&gt;Kudu 1.0.1 release notes&lt;/a&gt;
 for more details.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Todd Lipcon has proposed a &lt;a href=&quot;https://lists.apache.org/thread.html/4c94d313e28381bb107682ffaf43adfd38bd7fb3b03c98e3c86c52e2@%3Cdev.kudu.apache.org%3E&quot;&gt;release plan&lt;/a&gt;
 for the next few months. The proposal is to have a 1.1 release in mid-November and
 a 1.2 release in mid-January. These would be time-based releases rather than
 gated on any particular feature scope; however, it’s anticipated that several
 new features and improvements will be ready in time for these releases.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Happy fourth birthday to the Kudu project! The initial commit was made
 on October 11th, 2012! Since then we’ve had 4888 more commits by 60
 authors!&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;As mentioned last week, a lot of contributors have been collaborating on
 design documents for upcoming work. Here’s the complete list of in-flight
 documents, along with the primary authors of these docs:
     &lt;ul&gt;
       &lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;Security features&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://goo.gl/wP5BJb&quot;&gt;Improved disk-failure handling&lt;/a&gt; (Dinesh Bhat)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/7K48&quot;&gt;Tools for manual recovery from corruption&lt;/a&gt; (Mike Percy and Dinesh Bhat)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;Addressing issues seen with the LogBlockManager&lt;/a&gt; (Adar Dembo)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;Providing proper snapshot/serializable consistency&lt;/a&gt; (David Alves)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/ARUP&quot;&gt;Improving re-replication of under-replicated tablets&lt;/a&gt; (Mike Percy)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit&quot;&gt;Avoiding Raft election storms&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/kudu-backup-scope&quot;&gt;Backup and bulk load&lt;/a&gt; (Dan Burkert)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/SM6V&quot;&gt;Improving diagnosability of client errors&lt;/a&gt; (Alexey Serbin)&lt;/li&gt;
     &lt;/ul&gt;

     &lt;p&gt;In many cases, work is now progressing on implementation of these ideas,
 but these are considered living documents. It’s not too late to add your
 comments or volunteer to help out.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;JD Cryans has been working on cleaning up the Java client. Several complex pieces
 of code were completely removed, and other parts were refactored into new
 standalone classes for better modularity. Along the way, JD also
 &lt;a href=&quot;http://gerrit.cloudera.org:8080/4706&quot;&gt;reduced lock contention&lt;/a&gt; on a frequently-accessed
 data structure.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Todd Lipcon implemented and committed Raft “pre-elections” as described in the
 [election storm mitigation design document]((https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit).
 Initial experiments, detailed in the document, indicate that this will substantially
 improve leader stability on clusters with overloaded disks and lots of tablets.&lt;/p&gt;

     &lt;p&gt;Following this patch, Todd worked on some cleanup and refactor of the Consensus
 implementation, removing a bunch of dead code and splitting some classes up
 into smaller pieces. This is preparing for some improvements in locking
 granularity also described in the same document.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Dan Burkert and Todd Lipcon have started submitting patches to integrate Kerberos
 authentication with Kudu’s RPC system. Dan posted a
 &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4752/&quot;&gt;patch&lt;/a&gt; which adds “MiniKDC”, some test
 infrastructure for starting and stopping a standalone Kerberos service in
 the context of a test. Todd worked on adding
 &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4763/&quot;&gt;support for Kerberos authentication&lt;/a&gt;
 during RPC negotiation.&lt;/p&gt;

     &lt;p&gt;These patches are just the beginning of the security work, but form an important
 base to build on top of. The design uses Kerberos both as a mechanism to authenticate
 clients as well as a way to mutually authenticate tablet servers with the master.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 11th, 2016</title><link href="/2016/10/11/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 11th, 2016" /><published>2016-10-11T00:00:00-07:00</published><updated>2016-10-11T00:00:00-07:00</updated><id>/2016/10/11/weekly-update</id><content type="html" xml:base="/2016/10/11/weekly-update.html">&lt;p&gt;Welcome to the twenty-first edition of the Kudu Weekly Update. Astute
 readers will notice that the weekly blog posts have been not-so-weekly
 of late – in fact, it has been nearly two months since the previous post
 as I and others have focused on releases, conferences, etc.&lt;/p&gt;

 &lt;p&gt;So, rather than covering just this past week, this post will cover highlights
 of the progress since the 1.0 release in mid-September. If you’re interested
 in learning about progress prior to that release, check the
 &lt;a href=&quot;http://kudu.apache.org/releases/1.0.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;On September 12th, the Kudu PMC announced that Alexey Serbin and Will
 Berkeley had been voted as new committers and PMC members.&lt;/p&gt;

     &lt;p&gt;Alexey’s contributions prior to committership included
 &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3952/&quot;&gt;AUTO_FLUSH_BACKGROUND&lt;/a&gt; support
 in C++ as well as &lt;a href=&quot;http://kudu.apache.org/apidocs/&quot;&gt;API documentation&lt;/a&gt;
 for the C++ client API.&lt;/p&gt;

     &lt;p&gt;Will’s contributions include several fixes to the web UIs, large
 improvements the Flume integration, and a lot of good work
 burning down long-standing bugs.&lt;/p&gt;

     &lt;p&gt;Both contributors were “acting the part” and the PMC was pleased to
 recognize their contributions with committership.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Kudu 1.0.0 was &lt;a href=&quot;https://kudu.apache.org/2016/09/20/apache-kudu-1-0-0-released.html&quot;&gt;released&lt;/a&gt;
 on September 19th. Most community members have upgraded by this point
 and have been reporting improved stability and performance.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Dan Burkert has been managing a Kudu 1.0.1 release to address a few
 important bugs discovered since 1.0.0. The vote passed on Monday
 afternoon, so the release should be made officially available
 later this week.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;After the 1.0 release, many contributors have gone into a design phase
 for upcoming work. Over the last couple of weeks, developers have posted
 scoping and design documents for topics including:
     &lt;ul&gt;
       &lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;Security features&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://goo.gl/wP5BJb&quot;&gt;Improved disk-failure handling&lt;/a&gt; (Dinesh Bhat)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/7K48&quot;&gt;Tools for manual recovery from corruption&lt;/a&gt; (Mike Percy and Dinesh Bhat)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;Addressing issues seen with the LogBlockManager&lt;/a&gt; (Adar Dembo)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;Providing proper snapshot/serializable consistency&lt;/a&gt; (David Alves)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://s.apache.org/ARUP&quot;&gt;Improving re-replication of under-replicated tablets&lt;/a&gt; (Mike Percy)&lt;/li&gt;
       &lt;li&gt;&lt;a href=&quot;https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit&quot;&gt;Avoiding Raft election storms&lt;/a&gt; (Todd Lipcon)&lt;/li&gt;
     &lt;/ul&gt;

     &lt;p&gt;The development community has no particular rule that all work must be
 accompanied by such a document, but in the past they have proven useful
 for fleshing out ideas around a design before beginning implementation.
 As Kudu matures, we can probably expect to see more of this kind of planning
 and design discussion.&lt;/p&gt;

     &lt;p&gt;If any of the above work areas sounds interesting to you, please take a
 look and leave your comments! Similarly, if you are interested in contributing
 in any of these areas, please feel free to volunteer on the mailing list.
 Help of all kinds (coding, documentation, testing, etc) is welcomed.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;Adar Dembo spent a chunk of time re-working the &lt;code&gt;thirdparty&lt;/code&gt; directory
 that contains most of Kudu’s native dependencies. The major resulting
 changes are:
     &lt;ul&gt;
       &lt;li&gt;Build directories are now cleanly isolated from source directories,
 improving cleanliness of re-builds.&lt;/li&gt;
       &lt;li&gt;ThreadSanitizer (TSAN) builds now use &lt;code&gt;libc++&lt;/code&gt; instead of &lt;code&gt;libstdcxx&lt;/code&gt;
 for C++ library support. The &lt;code&gt;libc++&lt;/code&gt; library has better support for
 sanitizers, is easier to build in isolation, and solves some compatibility
 issues that Adar was facing with GCC 5 on Ubuntu Xenial.&lt;/li&gt;
       &lt;li&gt;All of the thirdparty dependencies now build with TSAN instrumentation,
 which improves our coverage of this very effective tooling.&lt;/li&gt;
     &lt;/ul&gt;

     &lt;p&gt;The impact to most developers is that, if you have an old source checkout,
 it’s highly likely you will need to clean and re-build the thirdparty
 directory.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;Many contributors spent time in recent weeks trying to address the
 flakiness of various test cases. The Kudu project uses a
 &lt;a href=&quot;http://dist-test.cloudera.org:8080/&quot;&gt;dashboard&lt;/a&gt; to track the flakiness
 of each test case, and &lt;a href=&quot;http://dist-test.cloudera.org/&quot;&gt;distributed test infrastructure&lt;/a&gt;
 to facilitate reproducing test flakes.    &lt;!-- spaces cause line break --&gt;
 As might be expected, some of the flaky tests were due to bugs or
 timing assumptions in the tests themselves. However, this effort
 also identified several real bugs:
     &lt;ul&gt;
       &lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4570]&quot;&gt;tight retry loop&lt;/a&gt; in the
 Java client.&lt;/li&gt;
       &lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4395&quot;&gt;memory leak&lt;/a&gt; due to circular
 references in the C++ client.&lt;/li&gt;
       &lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4551&quot;&gt;crash&lt;/a&gt; which could affect
 tools used for problem diagnosis.&lt;/li&gt;
       &lt;li&gt;A &lt;a href=&quot;http://gerrit.cloudera.org:8080/4409&quot;&gt;divergence bug&lt;/a&gt; in Raft consensus
 under particularly torturous scenarios.&lt;/li&gt;
       &lt;li&gt;A potential &lt;a href=&quot;http://gerrit.cloudera.org:8080/4394&quot;&gt;crash during tablet server startup&lt;/a&gt;.&lt;/li&gt;
       &lt;li&gt;A case in which &lt;a href=&quot;http://gerrit.cloudera.org:8080/4626&quot;&gt;thread startup could be delayed&lt;/a&gt;
 by built-in monitoring code.&lt;/li&gt;
     &lt;/ul&gt;

     &lt;p&gt;As a result of these efforts, the failure rate of these flaky tests has
 decreased significantly and the stability of Kudu releases continues
 to increase.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Dan Burkert picked up work originally started by Sameer Abhyankar on
 &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1363&quot;&gt;KUDU-1363&lt;/a&gt;, which adds
 support for adding &lt;code&gt;IN (...)&lt;/code&gt; predicates to scanners. Dan committed the
 &lt;a href=&quot;http://gerrit.cloudera.org:8080/2986&quot;&gt;main patch&lt;/a&gt; as well as corresponding
 &lt;a href=&quot;http://gerrit.cloudera.org:8080/4530&quot;&gt;support in the Java client&lt;/a&gt;.
 Jordan Birdsell quickly added corresponding support in &lt;a href=&quot;http://gerrit.cloudera.org:8080/4548&quot;&gt;Python&lt;/a&gt;.
 This new feature will be available in an upcoming release.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Work continues on the &lt;code&gt;kudu&lt;/code&gt; command line tool. Dinesh Bhat added
 the ability to ask a tablet’s leader to &lt;a href=&quot;http://gerrit.cloudera.org:8080/4533&quot;&gt;step down&lt;/a&gt;
 and Alexey Serbin added a &lt;a href=&quot;http://gerrit.cloudera.org:8080/4412&quot;&gt;tool to insert random data into a
 table&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Jordan Birdsell continues to be on a tear improving the Python client.
 The patches are too numerous to mention, but highlights include Python 3
 support as well as near feature parity with the C++ client.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Todd Lipcon has been doing some refactoring and cleanup in the Raft
 consensus implementation. In addition to simplifying and removing code,
 he committed &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1567&quot;&gt;KUDU-1567&lt;/a&gt;,
 which improves write performance in many cases by a factor of three
 or more while also improving stability.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Brock Noland is working on support for &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4491/&quot;&gt;INSERT IGNORE&lt;/a&gt;
 as a first-class part of the Kudu API. Of course this functionality
 can already be done by simply performing normal inserts and ignoring any
 resulting errors, but pushing it to the server prevents the server
 from counting such operations as errors.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;Congratulations to Ninad Shringarpure for contributing his first patches
 to Kudu. Ninad contributed two documentation fixes and improved
 formatting on the Kudu web UI.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-first edition of the Kudu Weekly Update. Astute
 readers will notice that the weekly blog posts have been not-so-weekly
 of late &amp;#8211; in fact, it has been nearly two months since the previous post
 as I and others have focused on releases, conferences, etc.

 So, rather than covering just this past week, this post will cover highlights
 of the progress since the 1.0 release in mid-September. If you&amp;#8217;re interested
 in learning about progress prior to that release, check the
 release notes.</summary></entry><entry><title>Apache Kudu at Strata+Hadoop World NYC 2016</title><link href="/2016/09/26/strata-nyc-kudu-talks.html" rel="alternate" type="text/html" title="Apache Kudu at Strata+Hadoop World NYC 2016" /><published>2016-09-26T00:00:00-07:00</published><updated>2016-09-26T00:00:00-07:00</updated><id>/2016/09/26/strata-nyc-kudu-talks</id><content type="html" xml:base="/2016/09/26/strata-nyc-kudu-talks.html">&lt;p&gt;This week in New York, O’Reilly and Cloudera will be hosting Strata+Hadoop World
 2016. If you’re interested in Kudu, there will be several opportunities to
 learn more, both from the open source development team as well as some companies
 who are already adopting Kudu for their use cases.
 &lt;!--more--&gt;
 Here are some of the sessions to check out:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52146&quot;&gt;Powering real-time analytics on Xfinity using Kudu&lt;/a&gt; (Wednesday, 11:20am)&lt;/p&gt;

     &lt;p&gt;Sridhar Alla and Kiran Muglurmath from Comcast will talk about how they’re using
 Kudu to store hundreds of billions of Set-Top Box (STB) events, performing
 analytics concurrently with real-time streaming ingest of thousands of events
 per second.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52248&quot;&gt;Creating real-time, data-centric applications with Impala and Kudu&lt;/a&gt; (Wednesday, 2:05pm)&lt;/p&gt;

     &lt;p&gt;Marcel Kornacker and Todd Lipcon will introduce how Impala and Kudu together
 allow users to build real-time applications that support streaming ingest,
 random access updates and deletes, and high performance analytic SQL in
 a single system.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52168&quot;&gt;Streaming cybersecurity into Graph: Accelerating data into Datastax Graph and Blazegraph&lt;/a&gt; (Thursday, 1:15pm)&lt;/p&gt;

     &lt;p&gt;Joshua Patterson, Michael Wendt, and Keith Kraus from Accenture Labs will discuss
 how they have built cybersecurity solutions using graph analytics on top of open
 source technology like Apache Kafka, Spark, and Flink. They will also touch on
 why Kudu is becoming an integral part of Accenture’s technology stack.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52050&quot;&gt;How GE analyzes billions of mission-critical events in real time using Apache Apex, Spark, and Kudu&lt;/a&gt; (Thursday, 2:05pm)&lt;/p&gt;

     &lt;p&gt;Venkatesh Sivasubramanian and Luis Ramos from GE Digital will discuss how they
 collect and process real-time IoT data using Apache Apex and Apache Spark, and
 how they’ve been experimenting with Apache Kudu for time series data storage.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;&lt;a href=&quot;http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/51887&quot;&gt;Apache Kudu: 1.0 and Beyond&lt;/a&gt; (Thursday, 4:35pm)&lt;/p&gt;

     &lt;p&gt;Todd Lipcon from Cloudera will review the new features that were developed between Kudu 0.5
 (the first public release one year ago) and Kudu 1.0, released just last week. Additionally,
 this talk will provide some insight into the upcoming project roadmap for the coming year.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Aside from these organized sessions, word has it that there will be various demos
 featuring Apache Kudu at the Cloudera and ZoomData vendor booths.&lt;/p&gt;

 &lt;p&gt;If you’re not attending the conference, but still based in NYC, all hope is
 not lost. Michael Crutcher from Cloudera will be presenting an introduction
 to Apache Kudu at the &lt;a href=&quot;http://www.meetup.com/mysqlnyc/events/233599664/&quot;&gt;SQL NYC Meetup&lt;/a&gt;.
 Be sure to RSVP as spots are filling up fast.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>This week in New York, O&amp;#8217;Reilly and Cloudera will be hosting Strata+Hadoop World
 2016. If you&amp;#8217;re interested in Kudu, there will be several opportunities to
 learn more, both from the open source development team as well as some companies
 who are already adopting Kudu for their use cases.</summary></entry><entry><title>Apache Kudu 1.0.0 released</title><link href="/2016/09/20/apache-kudu-1-0-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.0.0 released" /><published>2016-09-20T00:00:00-07:00</published><updated>2016-09-20T00:00:00-07:00</updated><id>/2016/09/20/apache-kudu-1-0-0-released</id><content type="html" xml:base="/2016/09/20/apache-kudu-1-0-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.0.0!&lt;/p&gt;

 &lt;p&gt;This latest version adds several new features, including:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;Removal of multiversion concurrency control (MVCC) history is now supported.
 This allows Kudu to reclaim disk space, where previously Kudu would keep a full
 history of all changes made to a given table since the beginning of time.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Most of Kudu’s command line tools have been consolidated under a new
 top-level &lt;code&gt;kudu&lt;/code&gt; tool. This reduces the number of large binaries distributed
 with Kudu and also includes much-improved help output.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Administrative tools including &lt;code&gt;kudu cluster ksck&lt;/code&gt; now support running
 against multi-master Kudu clusters.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;The C++ client API now supports writing data in &lt;code&gt;AUTO_FLUSH_BACKGROUND&lt;/code&gt; mode.
 This can provide higher throughput for ingest workloads.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;This release also includes many bug fixes, optimizations, and other
 improvements, detailed in the &lt;a href=&quot;/releases/1.0.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.0.0/&quot;&gt;Kudu 1.0.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.0.0!

 This latest version adds several new features, including:</summary></entry><entry><title>Pushing Down Predicate Evaluation in Apache Kudu</title><link href="/2016/09/16/predicate-pushdown.html" rel="alternate" type="text/html" title="Pushing Down Predicate Evaluation in Apache Kudu" /><published>2016-09-16T00:00:00-07:00</published><updated>2016-09-16T00:00:00-07:00</updated><id>/2016/09/16/predicate-pushdown</id><content type="html" xml:base="/2016/09/16/predicate-pushdown.html">&lt;p&gt;I had the pleasure of interning with the Apache Kudu team at Cloudera this
 summer. This project was my summer contribution to Kudu: a restructuring of the
 scan path to speed up queries.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

 &lt;p&gt;In Kudu, &lt;em&gt;predicate pushdown&lt;/em&gt; refers to the way in which predicates are
 handled. When a scan is requested, its predicates are passed through the
 different layers of Kudu’s storage hierarchy, allowing for pruning and other
 optimizations to happen at each level before reaching the underlying data.&lt;/p&gt;

 &lt;p&gt;While predicates are pushed down, predicate evaluation itself occurs at a fairly
 high level, precluding the evaluation process from certain data-specific
 optimizations. These optimizations can make tablet scans an order of magnitude
 faster, if not more.&lt;/p&gt;

 &lt;h2 id=&quot;a-day-in-the-life-of-a-query&quot;&gt;A Day in the Life of a Query&lt;/h2&gt;

 &lt;p&gt;Because Kudu is a columnar storage engine, its scan path has a number of
 optimizations to avoid extraneous reads, copies, and computation. When a query
 is sent to a tablet server, the server prunes tablets based on the
 primary key, directing the request to only the tablets that contain the key
 range of interest. Once at a tablet, only the columns relevant to the query are
 scanned. Further pruning is done over the primary key, and if the query is
 predicated on non-key columns, the entire column is scanned. The columns in a
 tablet are stored as &lt;em&gt;cfiles&lt;/em&gt;, which are split into encoded &lt;em&gt;blocks&lt;/em&gt;. Once the
 relevant cfiles are determined, the data are materialized by the block
 decoders, i.e. their underlying data are decoded and copied into a buffer,
 which is passed back to the tablet layer. The tablet can then evaluate the
 predicate on the batch of data and mark which rows should be returned to the
 client.&lt;/p&gt;

 &lt;p&gt;One of the encoding types I worked very closely with is &lt;em&gt;dictionary encoding&lt;/em&gt;,
 an encoding type for strings that performs particularly well for cfiles that
 have repeating values. Rather than storing every row’s string, each unique
 string is assigned a numeric codeword, and the rows are stored numerically on
 disk. When materializing a dictionary block, all of the numeric data are scanned
 and all of the corresponding strings are copied and buffered for evaluation.
 When the vocabulary of a dictionary-encoded cfile gets too large, the blocks
 begin switching to &lt;em&gt;plain encoding mode&lt;/em&gt; to act like &lt;em&gt;plain-encoded&lt;/em&gt; blocks.&lt;/p&gt;

 &lt;p&gt;In a plain-encoded block, strings are stored contiguously and the character
 offsets to the start of each string are stored as a list of integers. When
 materializing, all of the strings are copied to a buffer for evaluation.&lt;/p&gt;

 &lt;p&gt;Therein lies room for improvement: this predicate evaluation path is the same
 for all data types and encoding types. Within the tablet, the correct cfiles
 are determined, the cfiles’ decoders are opened, all of the data are copied to
 a buffer, and the predicates are evaluated on this buffered data via
 type-specific comparators. This path is extremely flexible, but because it was
 designed to be encoding-independent, there is room for improvement.&lt;/p&gt;

 &lt;h2 id=&quot;trimming-the-fat&quot;&gt;Trimming the Fat&lt;/h2&gt;

 &lt;p&gt;The first step is to allow the decoders access to the predicate. In doing so,
 each encoding type can specialize its evaluation. Additionally, this puts the
 decoder in a position where it can determine whether a given row satisfies the
 query, which in turn, allows the decoders to determine what data gets copied
 instead of eagerly copying all of its data to get evaluated.&lt;/p&gt;

 &lt;p&gt;Take the case of dictionary-encoded strings as an example. With the existing
 scan path, not only are all of the strings in a column copied into a buffer, but
 string comparisons are done on every row. By taking advantage of the fact that
 the data can be represented as integers, the cost of determining the query
 results can be greatly reduced. The string comparisons can be swapped out with
 evaluation based on the codewords, in which case the room for improvement boils
 down to how to most quickly determine whether or not a given codeword
 corresponds to a string that satisfies the predicate. Dictionary columns will
 now use a bitset to store the codewords that match the predicates.  It will then
 scan through the integer-valued data and checks the bitset to determine whether
 it should copy the corresponding string over.&lt;/p&gt;

 &lt;p&gt;This is great in the best case scenario where a cfile’s vocabulary is small,
 but when the vocabulary gets too large and the dictionary blocks switch to plain
 encoding mode, performance is hampered. In this mode, the blocks don’t utilize
 any dictionary metadata and end up wasting the codeword bitset. That isn’t to
 say all is lost: the decoders can still evaluate a predicate via string
 comparison, and the fact that evaluation can still occur at the decoder-level
 means the eager buffering can still be avoided.&lt;/p&gt;

 &lt;p&gt;Dictionary encoding is a perfect storm in that the decoders can completely
 evaluate the predicates. This is not the case for most other encoding types,
 but having decoders support evaluation leaves the door open for other encoding
 types to extend this idea.&lt;/p&gt;

 &lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;
 &lt;p&gt;Depending on the dataset and query, predicate pushdown can lead to significant
 improvements. Tablet scans were timed with datasets consisting of repeated
 string patterns of tunable length and tunable cardinality.&lt;/p&gt;

 &lt;p&gt;&lt;img src=&quot;/img/predicate-pushdown/pushdown-10.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;
 &lt;img src=&quot;/img/predicate-pushdown/pushdown-10M.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;&lt;/p&gt;

 &lt;p&gt;The above plots show the time taken to completely scan a single tablet, recorded
 using a dataset of ten million rows of strings with length ten. Predicates were
 designed to select values out of bounds (Empty), select a single value (Equal,
 i.e. for cardinality &lt;em&gt;k&lt;/em&gt;, this would select 1/&lt;em&gt;k&lt;/em&gt; of the dataset), select half
 of the full range (Half), and select the full range of values (All).&lt;/p&gt;

 &lt;p&gt;With the original evaluation implementation, the tablet must copy and scan
 through the tablet to determine whether any values match. This means that even
 when the result set is small, the full column is still copied. This is avoided
 by pushing down predicates, which only copies as needed, and can be seen in the
 above queries: those with near-empty result sets (Empty and Equal) have shorter
 scan times than those with larger result sets (Half and All).&lt;/p&gt;

 &lt;p&gt;Note that for dictionary encoding, given a low cardinality, Kudu can completely
 rely on the dictionary codewords to evaluate, making the query significantly
 faster. At higher cardinalities, the dictionaries completely fill up and the
 blocks fall back on plain encoding. The slower, albeit still improved,
 performance on the dataset containing 10M unique values reflects this.&lt;/p&gt;

 &lt;p&gt;&lt;img src=&quot;/img/predicate-pushdown/pushdown-tpch.png&quot; alt=&quot;png&quot; class=&quot;img-responsive&quot; /&gt;&lt;/p&gt;

 &lt;p&gt;Similar predicates were run with the TPC-H dataset, querying on the shipdate
 column. The full path of a query includes not only the tablet scanning itself,
 but also RPCs and batched data transfer to the caller as the scan progresses.
 As such, the times plotted above refer to the average end-to-end time required
 to scan and return a batch of rows. Regardless of this additional overhead,
 significant improvements on the scan path still yield substantial improvements
 to the query performance as a whole.&lt;/p&gt;

 &lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

 &lt;p&gt;Pushing down predicate evaluation in Kudu yielded substantial improvements to
 the scan path. For dictionary encoding, pushdown can be particularly powerful,
 and other encoding types are either unaffected or also improved. This change has
 been pushed to the main branch of Kudu, and relevant commits can be found
 &lt;a href=&quot;https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010&quot;&gt;here&lt;/a&gt;
 and
 &lt;a href=&quot;https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

 &lt;p&gt;This summer has been a phenomenal learning experience for me, in terms of the
 tools, the workflow, the datasets, the thought-processes that go into building
 something at Kudu’s scale. I am extremely thankful for all of the mentoring and
 support I received, and that I got to be a part of Kudu’s journey from
 incubating to a Top Level Apache project. I can’t express enough how grateful I
 am for the amount of support I got from the Kudu team, from the intern
 coordinators, and from the Cloudera community as a whole.&lt;/p&gt;</content><author><name>Andrew Wong</name></author><summary>I had the pleasure of interning with the Apache Kudu team at Cloudera this
 summer. This project was my summer contribution to Kudu: a restructuring of the
 scan path to speed up queries.</summary></entry></feed>