| <?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2017-04-19T10:10:47-07:00</updated><id>/</id><entry><title>Apache Kudu 1.3.1 released</title><link href="/2017/04/19/apache-kudu-1-3-1-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.1 released" /><published>2017-04-19T00:00:00-07:00</published><updated>2017-04-19T00:00:00-07:00</updated><id>/2017/04/19/apache-kudu-1-3-1-released</id><content type="html" xml:base="/2017/04/19/apache-kudu-1-3-1-released.html"><p>The Apache Kudu team is happy to announce the release of Kudu 1.3.1!</p> |
| |
| <p>Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered |
| in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be |
| incorrectly deleted after certain sequences of node failures. Several other |
| bugs are also fixed. See the release notes for details.</p> |
| |
| <p>Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.</p> |
| |
| <ul> |
| <li>Download the <a href="/releases/1.3.1/">Kudu 1.3.1 source release</a></li> |
| <li>Convenience binary artifacts for the Java client and various Java |
| integrations (eg Spark, Flume) are also now available via the ASF Maven |
| repository.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.1! |
| |
| Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered |
| in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be |
| incorrectly deleted after certain sequences of node failures. Several other |
| bugs are also fixed. See the release notes for details. |
| |
| Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately. |
| |
| |
| Download the Kudu 1.3.1 source release |
| Convenience binary artifacts for the Java client and various Java |
| integrations (eg Spark, Flume) are also now available via the ASF Maven |
| repository.</summary></entry><entry><title>Apache Kudu 1.3.0 released</title><link href="/2017/03/20/apache-kudu-1-3-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.0 released" /><published>2017-03-20T00:00:00-07:00</published><updated>2017-03-20T00:00:00-07:00</updated><id>/2017/03/20/apache-kudu-1-3-0-released</id><content type="html" xml:base="/2017/03/20/apache-kudu-1-3-0-released.html"><p>The Apache Kudu team is happy to announce the release of Kudu 1.3.0!</p> |
| |
| <p>Apache Kudu 1.3 is a minor release which adds various new features, |
| improvements, bug fixes, and optimizations on top of Kudu |
| 1.2. Highlights include:</p> |
| |
| <!--more--> |
| |
| <ul> |
| <li>significantly improved support for security, including Kerberos |
| authentication, TLS encryption, and coarse-grained (cluster-level) |
| authorization</li> |
| <li>automatic garbage collection of historical versions of data</li> |
| <li>lower space consumption and better performance in default |
| configurations.</li> |
| </ul> |
| |
| <p>The above list of changes is non-exhaustive. Please refer to the |
| <a href="/releases/1.3.0/docs/release_notes.html">release notes</a> |
| for an expanded list of important improvements, bug fixes, and |
| incompatible changes before upgrading.</p> |
| |
| <p>Thanks to the 25 developers who contributed code or documentation to |
| this release!</p> |
| |
| <ul> |
| <li>Download the <a href="/releases/1.3.0/">Kudu 1.3.0 source release</a></li> |
| <li>Convenience binary artifacts for the Java client and various Java |
| integrations (eg Spark, Flume) are also now available via the ASF Maven |
| repository.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.0! |
| |
| Apache Kudu 1.3 is a minor release which adds various new features, |
| improvements, bug fixes, and optimizations on top of Kudu |
| 1.2. Highlights include:</summary></entry><entry><title>Apache Kudu 1.2.0 released</title><link href="/2017/01/20/apache-kudu-1-2-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.2.0 released" /><published>2017-01-20T00:00:00-08:00</published><updated>2017-01-20T00:00:00-08:00</updated><id>/2017/01/20/apache-kudu-1-2-0-released</id><content type="html" xml:base="/2017/01/20/apache-kudu-1-2-0-released.html"><p>The Apache Kudu team is happy to announce the release of Kudu 1.2.0!</p> |
| |
| <p>The new release adds several new features and improvements, including:</p> |
| |
| <!--more--> |
| |
| <ul> |
| <li>User data such as row contents is now redacted from logging statements.</li> |
| <li>Kudu’s ability to provide strong consistency guarantees has been substantially improved.</li> |
| <li>Various performance improvements in metadata management as well as optimizations for BITSHUFFLE encoding on AVX2-capable hosts.</li> |
| </ul> |
| |
| <p>Additionally, 1.2.0 fixes a number of important bugs, including:</p> |
| |
| <ul> |
| <li>Kudu now automatically limits its usage of file descriptors, preventing crashes due to ulimit exhaustion.</li> |
| <li>Fixed a long-standing issue which could cause ext4 file system corruption on RHEL 6.</li> |
| <li>Fixed a disk space leak.</li> |
| <li>Several fixes for correctness in various edge cases.</li> |
| </ul> |
| |
| <p>The above list of changes is non-exhaustive. Please refer to the |
| <a href="/releases/1.2.0/docs/release_notes.html">release notes</a> |
| for an expanded list of important improvements, bug fixes, and |
| incompatible changes before upgrading.</p> |
| |
| <ul> |
| <li>Download the <a href="/releases/1.2.0/">Kudu 1.2.0 source release</a></li> |
| <li>Convenience binary artifacts for the Java client and various Java |
| integrations (eg Spark, Flume) are also now available via the ASF Maven |
| repository.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.2.0! |
| |
| The new release adds several new features and improvements, including:</summary></entry><entry><title>Apache Kudu Weekly Update November 15th, 2016</title><link href="/2016/11/15/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 15th, 2016" /><published>2016-11-15T00:00:00-08:00</published><updated>2016-11-15T00:00:00-08:00</updated><id>/2016/11/15/weekly-update</id><content type="html" xml:base="/2016/11/15/weekly-update.html"><p>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</p> |
| |
| <!--more--> |
| |
| <h2 id="project-news">Project news</h2> |
| |
| <ul> |
| <li> |
| <p>The first release candidate for Kudu 1.1.0 is <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCADY20s7ZKZkPmUEcTexW%3D%2B_%2BLnDY2hABZg0-UZD3jvWAs9-pog%40mail.gmail.com%3E">now available</a>.</p> |
| |
| <dl> |
| <dt>Noteworthy new features/improvements:</dt> |
| <dd> |
| <ul> |
| <li>The Python client has been brought to feature parity with the C++ and Java clients.</li> |
| <li>IN LIST predicates.</li> |
| <li>Java client now features client-side tracing.</li> |
| <li>Kudu now publishes jar files for Spark 2.0 compiled with Scala 2.11.</li> |
| <li>Kudu’s Raft implementation now features pre-elections. In our tests this has greatly improved stability.</li> |
| </ul> |
| </dd> |
| </dl> |
| |
| <p>Community developers and users are encouraged to download the source |
| tarball and vote on the release.</p> |
| |
| <p>For more information on what’s new, check out the |
| <a href="https://github.com/apache/kudu/blob/branch-1.1.x/docs/release_notes.adoc">release notes</a>. |
| <em>Note:</em> some links from these in-progress release notes will not be live until the |
| release itself is published.</p> |
| </li> |
| <li> |
| <p>On November 7th, the Kudu PMC announced that Jordan Birdsell, from State Farm, had been voted |
| in as a new committer and PMC member.</p> |
| |
| <p>Jordan’s contributions include extensive work on the python client, throwing it some much needed |
| love, and bringing it to feature parity with the other clients.</p> |
| |
| <p>Besides his extensive code contributions Jordan has also been active in reviewing other |
| developer’s patches and helping the community in general, on slack and other channels.</p> |
| |
| <p>Jordan has been doing great work and the Kudu PMC was pleased to recognize his contributions |
| with committership.</p> |
| </li> |
| <li> |
| <p>Mike Percy will be presenting Kudu Wednesday 16th November at <a href="https://apachebigdataeu2016.sched.org/">Apache Big Data Europe, in Seville</a>.</p> |
| </li> |
| <li> |
| <p>Congratulations to Haijie Hong for his <a href="https://gerrit.cloudera.org/#/c/4822/">first contribution to Kudu!</a>. |
| Haijie fixed some edge cases in BitWriter that were blocking RLE usage for 64 bit types.</p> |
| </li> |
| <li> |
| <p>Congratulations to Maxim Smyatkin for his <a href="https://gerrit.cloudera.org/#/q/Maxim">first contributions to Kudu!</a>. |
| Maxim has contributed several patches helping with debug and cleanup.</p> |
| </li> |
| </ul> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>A lot of progress has been done towards the goals that were set in the scope docs introduced in |
| the last couple of posts. Specifically:</p> |
| |
| <ul> |
| <li> |
| <p>Dan Burkert, Todd Lipcon and Alexey Serbin have doubled down on the security effort. They have |
| been working on enabling Kerberos authentication and rpc encryption. The <a href="https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem">security scope doc</a> |
| has been updated with the latest plans for security and many patches have been merged already.</p> |
| </li> |
| <li> |
| <p>David Alves has continued the work on <a href="https://s.apache.org/7VCo">consistency</a>. Up for review |
| and partially pushed is a patch series to address row history loss if a row is deleted and then |
| re-inserted. Also in progress is work to make sure that scans at a snapshot from followers |
| always return same data as if they were executed on the leader. This helps with Read-Your-Writes |
| when reading from lagging replicas.</p> |
| </li> |
| <li> |
| <p>Adar Dembo has been making good progress <a href="https://s.apache.org/uOOt">addressing issues seen with the LogBlockManager</a>. |
| A series of patches have been merged with various fixes to block managers in general and to the |
| log block manager in particular.</p> |
| </li> |
| <li> |
| <p>Dinesh Bhat has been working on improving the manual recovery tools for Kudu. Namely, he has |
| added a tool to force a remote replica copy to a destination server, and a tool to delete a |
| local replica of a tablet. The latter is useful when a tablet cannot come up due to bad state.</p> |
| </li> |
| <li> |
| <p>Jean-Daniel Cryans has implemented RPC tracing for the java client, greatly improving |
| debuggability. JD also has added ReplicaSelection to the java client, allowing to perform |
| scans on replicas other than the leader, which should be of great help for load-balancing.</p> |
| </li> |
| <li> |
| <p>Besides the feature parity contributions, Jordan Birdsell has laid out a |
| <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCAGaaj_VKfB4mhu6eExHCWo0%3D6Qd0HFWy7bg9e39JgOaFPGJ1nQ%40mail.gmail.com%3E">roadmap for Python client work</a> |
| for the 1.2 release. Feedback from other Python client users is certainly appreciated.</p> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>David Alves</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update November 1st, 2016</title><link href="/2016/11/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 1st, 2016" /><published>2016-11-01T00:00:00-07:00</published><updated>2016-11-01T00:00:00-07:00</updated><id>/2016/11/01/weekly-update</id><content type="html" xml:base="/2016/11/01/weekly-update.html"><p>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</p> |
| |
| <!--more--> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Dan Burkert committed a piece of test infrastructure |
| called “MiniKDC” for both Java and C++. The MiniKDC sets up a short-lived |
| Kerberos environment in the context of a single test case, making it |
| easy to build tests of security features without requiring any special |
| infrastructure on the part of the developer.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon added support for Kerberos (GSSAPI) support to Kudu’s |
| RPC system, allowing servers to authenticate the user principal of |
| any inbound RPC connection. He also integrated Kudu’s C++ “MiniCluster” |
| test infrastructure to allow starting a Kerberized cluster in the |
| context of a test.</p> |
| </li> |
| <li> |
| <p>Dan, Todd, and Alexey Serbin have been iterating on a more detailed |
| <a href="https://docs.google.com/document/d/1Yu4iuIhaERwug1vS95yWDd_WzrNRIKvvVGUb31y-_mY/edit#">design doc</a> |
| for authentication in Kudu. This doc outlines the various non-Kerberos |
| methods that Kudu will use for authentication as well as how TLS will |
| be used to encrypt and authenticate some types of connections.</p> |
| </li> |
| <li> |
| <p>Part of the above design document involves Kudu servers generating and |
| signing X509 certificates on the fly to use for authenticated TLS. |
| Alexey has been working on a large <a href="https://gerrit.cloudera.org/#/c/4799/">patch</a> |
| which uses OpenSSL to provide key generation and signing functionality.</p> |
| </li> |
| <li> |
| <p>Sailesh Mukil has been working on adding support for |
| <a href="https://gerrit.cloudera.org/#/c/4789/">TLS in Kudu’s RPC system</a>. The TLS |
| support is a critical part of the overall design for security. This patch |
| has gone through several rounds of review and nearing completion.</p> |
| </li> |
| <li> |
| <p>JD Cryans has been continuing to improve the Java client, including adding |
| the ability to specify that the client would like to read the “closest” |
| replica (e.g. reading from a local copy if possible). Additionally, |
| JD has been working on some basic <a href="https://gerrit.cloudera.org/#/c/4781/">tracing support</a> |
| within the Java client. This tracing aims to make timeouts easier to understand |
| and diagnose.</p> |
| </li> |
| <li> |
| <p>Jordan Birdsell committed 9 more patches to the Python client, bringing it |
| very close to feature parity with C++. Jordan has a few more patches in flight |
| which should complete this long-running effort.</p> |
| </li> |
| <li> |
| <p>Congrats to new contributor Haijie Hong who committed his first patch this week. |
| Haijie added support for <a href="https://gerrit.cloudera.org/#/c/4822/">run-length encoding 64-bit integers</a>.</p> |
| </li> |
| <li> |
| <p>Will Berkeley picked back up work on <a href="https://gerrit.cloudera.org/#/c/4310/">improving the capability of ALTER |
| TABLE</a>. His in-flight patch adds support |
| for changing the default value of a column as well as changing storage attributes |
| such as desired block size, encoding, and compression.</p> |
| </li> |
| <li> |
| <p>Adar Dembo has been working on a series of patches for the Block Manager, the |
| component of Kudu which is responsible for laying out blocks on the local |
| file system. His patch series consists of a number of refactors to clean up |
| and improve the code structure, followed by an <a href="https://gerrit.cloudera.org/#/c/4848/">improvement to reduce file system |
| fragmentation</a>.</p> |
| </li> |
| <li> |
| <p>David Alves has been working on a <a href="https://gerrit.cloudera.org/#/c/4819/">patch series</a> |
| which adds support for storing ‘REINSERT’ deltas on disk. These records are |
| generated if a user inserts a row, deletes it, and inserts a new row with the |
| same primary key. Current versions of Kudu lose track of the history of the |
| prior version of the row in this scenario, which prevents correct snapshot reads. |
| David’s patch series fixes this.</p> |
| </li> |
| </ul> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 20th, 2016</title><link href="/2016/10/20/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 20th, 2016" /><published>2016-10-20T00:00:00-07:00</published><updated>2016-10-20T00:00:00-07:00</updated><id>/2016/10/20/weekly-update</id><content type="html" xml:base="/2016/10/20/weekly-update.html"><p>Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</p> |
| |
| <!--more--> |
| |
| <h2 id="project-news">Project news</h2> |
| |
| <ul> |
| <li> |
| <p>Kudu 1.0.1 was <a href="http://mail-archives.apache.org/mod_mbox/kudu-user/201610.mbox/%3CCALo2W-UgTa%2BX15_q_9FQpRUPWN53eyqFS10C5MXK1KpsFgqcyQ%40mail.gmail.com%3E">released</a> |
| on October 11th. This is a bug fix release which fixes several bugs found |
| in 1.0.0. See the <a href="http://kudu.apache.org/releases/1.0.1/docs/release_notes.html">Kudu 1.0.1 release notes</a> |
| for more details.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon has proposed a <a href="https://lists.apache.org/thread.html/4c94d313e28381bb107682ffaf43adfd38bd7fb3b03c98e3c86c52e2@%3Cdev.kudu.apache.org%3E">release plan</a> |
| for the next few months. The proposal is to have a 1.1 release in mid-November and |
| a 1.2 release in mid-January. These would be time-based releases rather than |
| gated on any particular feature scope; however, it’s anticipated that several |
| new features and improvements will be ready in time for these releases.</p> |
| </li> |
| <li> |
| <p>Happy fourth birthday to the Kudu project! The initial commit was made |
| on October 11th, 2012! Since then we’ve had 4888 more commits by 60 |
| authors!</p> |
| </li> |
| </ul> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li>As mentioned last week, a lot of contributors have been collaborating on |
| design documents for upcoming work. Here’s the complete list of in-flight |
| documents, along with the primary authors of these docs: |
| <ul> |
| <li><a href="https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem">Security features</a> (Todd Lipcon)</li> |
| <li><a href="https://goo.gl/wP5BJb">Improved disk-failure handling</a> (Dinesh Bhat)</li> |
| <li><a href="https://s.apache.org/7K48">Tools for manual recovery from corruption</a> (Mike Percy and Dinesh Bhat)</li> |
| <li><a href="https://s.apache.org/uOOt">Addressing issues seen with the LogBlockManager</a> (Adar Dembo)</li> |
| <li><a href="https://s.apache.org/7VCo">Providing proper snapshot/serializable consistency</a> (David Alves)</li> |
| <li><a href="https://s.apache.org/ARUP">Improving re-replication of under-replicated tablets</a> (Mike Percy)</li> |
| <li><a href="https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit">Avoiding Raft election storms</a> (Todd Lipcon)</li> |
| <li><a href="https://s.apache.org/kudu-backup-scope">Backup and bulk load</a> (Dan Burkert)</li> |
| <li><a href="https://s.apache.org/SM6V">Improving diagnosability of client errors</a> (Alexey Serbin)</li> |
| </ul> |
| |
| <p>In many cases, work is now progressing on implementation of these ideas, |
| but these are considered living documents. It’s not too late to add your |
| comments or volunteer to help out.</p> |
| </li> |
| <li> |
| <p>JD Cryans has been working on cleaning up the Java client. Several complex pieces |
| of code were completely removed, and other parts were refactored into new |
| standalone classes for better modularity. Along the way, JD also |
| <a href="http://gerrit.cloudera.org:8080/4706">reduced lock contention</a> on a frequently-accessed |
| data structure.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon implemented and committed Raft “pre-elections” as described in the |
| [election storm mitigation design document]((https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit). |
| Initial experiments, detailed in the document, indicate that this will substantially |
| improve leader stability on clusters with overloaded disks and lots of tablets.</p> |
| |
| <p>Following this patch, Todd worked on some cleanup and refactor of the Consensus |
| implementation, removing a bunch of dead code and splitting some classes up |
| into smaller pieces. This is preparing for some improvements in locking |
| granularity also described in the same document.</p> |
| </li> |
| <li> |
| <p>Dan Burkert and Todd Lipcon have started submitting patches to integrate Kerberos |
| authentication with Kudu’s RPC system. Dan posted a |
| <a href="https://gerrit.cloudera.org/#/c/4752/">patch</a> which adds “MiniKDC”, some test |
| infrastructure for starting and stopping a standalone Kerberos service in |
| the context of a test. Todd worked on adding |
| <a href="https://gerrit.cloudera.org/#/c/4763/">support for Kerberos authentication</a> |
| during RPC negotiation.</p> |
| |
| <p>These patches are just the beginning of the security work, but form an important |
| base to build on top of. The design uses Kerberos both as a mechanism to authenticate |
| clients as well as a way to mutually authenticate tablet servers with the master.</p> |
| </li> |
| </ul> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-second edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update October 11th, 2016</title><link href="/2016/10/11/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update October 11th, 2016" /><published>2016-10-11T00:00:00-07:00</published><updated>2016-10-11T00:00:00-07:00</updated><id>/2016/10/11/weekly-update</id><content type="html" xml:base="/2016/10/11/weekly-update.html"><p>Welcome to the twenty-first edition of the Kudu Weekly Update. Astute |
| readers will notice that the weekly blog posts have been not-so-weekly |
| of late – in fact, it has been nearly two months since the previous post |
| as I and others have focused on releases, conferences, etc.</p> |
| |
| <p>So, rather than covering just this past week, this post will cover highlights |
| of the progress since the 1.0 release in mid-September. If you’re interested |
| in learning about progress prior to that release, check the |
| <a href="http://kudu.apache.org/releases/1.0.0/docs/release_notes.html">release notes</a>.</p> |
| |
| <!--more--> |
| |
| <h2 id="project-news">Project news</h2> |
| |
| <ul> |
| <li> |
| <p>On September 12th, the Kudu PMC announced that Alexey Serbin and Will |
| Berkeley had been voted as new committers and PMC members.</p> |
| |
| <p>Alexey’s contributions prior to committership included |
| <a href="https://gerrit.cloudera.org/#/c/3952/">AUTO_FLUSH_BACKGROUND</a> support |
| in C++ as well as <a href="http://kudu.apache.org/apidocs/">API documentation</a> |
| for the C++ client API.</p> |
| |
| <p>Will’s contributions include several fixes to the web UIs, large |
| improvements the Flume integration, and a lot of good work |
| burning down long-standing bugs.</p> |
| |
| <p>Both contributors were “acting the part” and the PMC was pleased to |
| recognize their contributions with committership.</p> |
| </li> |
| <li> |
| <p>Kudu 1.0.0 was <a href="https://kudu.apache.org/2016/09/20/apache-kudu-1-0-0-released.html">released</a> |
| on September 19th. Most community members have upgraded by this point |
| and have been reporting improved stability and performance.</p> |
| </li> |
| <li> |
| <p>Dan Burkert has been managing a Kudu 1.0.1 release to address a few |
| important bugs discovered since 1.0.0. The vote passed on Monday |
| afternoon, so the release should be made officially available |
| later this week.</p> |
| </li> |
| </ul> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li>After the 1.0 release, many contributors have gone into a design phase |
| for upcoming work. Over the last couple of weeks, developers have posted |
| scoping and design documents for topics including: |
| <ul> |
| <li><a href="https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem">Security features</a> (Todd Lipcon)</li> |
| <li><a href="https://goo.gl/wP5BJb">Improved disk-failure handling</a> (Dinesh Bhat)</li> |
| <li><a href="https://s.apache.org/7K48">Tools for manual recovery from corruption</a> (Mike Percy and Dinesh Bhat)</li> |
| <li><a href="https://s.apache.org/uOOt">Addressing issues seen with the LogBlockManager</a> (Adar Dembo)</li> |
| <li><a href="https://s.apache.org/7VCo">Providing proper snapshot/serializable consistency</a> (David Alves)</li> |
| <li><a href="https://s.apache.org/ARUP">Improving re-replication of under-replicated tablets</a> (Mike Percy)</li> |
| <li><a href="https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit">Avoiding Raft election storms</a> (Todd Lipcon)</li> |
| </ul> |
| |
| <p>The development community has no particular rule that all work must be |
| accompanied by such a document, but in the past they have proven useful |
| for fleshing out ideas around a design before beginning implementation. |
| As Kudu matures, we can probably expect to see more of this kind of planning |
| and design discussion.</p> |
| |
| <p>If any of the above work areas sounds interesting to you, please take a |
| look and leave your comments! Similarly, if you are interested in contributing |
| in any of these areas, please feel free to volunteer on the mailing list. |
| Help of all kinds (coding, documentation, testing, etc) is welcomed.</p> |
| </li> |
| <li>Adar Dembo spent a chunk of time re-working the <code>thirdparty</code> directory |
| that contains most of Kudu’s native dependencies. The major resulting |
| changes are: |
| <ul> |
| <li>Build directories are now cleanly isolated from source directories, |
| improving cleanliness of re-builds.</li> |
| <li>ThreadSanitizer (TSAN) builds now use <code>libc++</code> instead of <code>libstdcxx</code> |
| for C++ library support. The <code>libc++</code> library has better support for |
| sanitizers, is easier to build in isolation, and solves some compatibility |
| issues that Adar was facing with GCC 5 on Ubuntu Xenial.</li> |
| <li>All of the thirdparty dependencies now build with TSAN instrumentation, |
| which improves our coverage of this very effective tooling.</li> |
| </ul> |
| |
| <p>The impact to most developers is that, if you have an old source checkout, |
| it’s highly likely you will need to clean and re-build the thirdparty |
| directory.</p> |
| </li> |
| <li>Many contributors spent time in recent weeks trying to address the |
| flakiness of various test cases. The Kudu project uses a |
| <a href="http://dist-test.cloudera.org:8080/">dashboard</a> to track the flakiness |
| of each test case, and <a href="http://dist-test.cloudera.org/">distributed test infrastructure</a> |
| to facilitate reproducing test flakes. <!-- spaces cause line break --> |
| As might be expected, some of the flaky tests were due to bugs or |
| timing assumptions in the tests themselves. However, this effort |
| also identified several real bugs: |
| <ul> |
| <li>A <a href="http://gerrit.cloudera.org:8080/4570]">tight retry loop</a> in the |
| Java client.</li> |
| <li>A <a href="http://gerrit.cloudera.org:8080/4395">memory leak</a> due to circular |
| references in the C++ client.</li> |
| <li>A <a href="http://gerrit.cloudera.org:8080/4551">crash</a> which could affect |
| tools used for problem diagnosis.</li> |
| <li>A <a href="http://gerrit.cloudera.org:8080/4409">divergence bug</a> in Raft consensus |
| under particularly torturous scenarios.</li> |
| <li>A potential <a href="http://gerrit.cloudera.org:8080/4394">crash during tablet server startup</a>.</li> |
| <li>A case in which <a href="http://gerrit.cloudera.org:8080/4626">thread startup could be delayed</a> |
| by built-in monitoring code.</li> |
| </ul> |
| |
| <p>As a result of these efforts, the failure rate of these flaky tests has |
| decreased significantly and the stability of Kudu releases continues |
| to increase.</p> |
| </li> |
| <li> |
| <p>Dan Burkert picked up work originally started by Sameer Abhyankar on |
| <a href="https://issues.apache.org/jira/browse/KUDU-1363">KUDU-1363</a>, which adds |
| support for adding <code>IN (...)</code> predicates to scanners. Dan committed the |
| <a href="http://gerrit.cloudera.org:8080/2986">main patch</a> as well as corresponding |
| <a href="http://gerrit.cloudera.org:8080/4530">support in the Java client</a>. |
| Jordan Birdsell quickly added corresponding support in <a href="http://gerrit.cloudera.org:8080/4548">Python</a>. |
| This new feature will be available in an upcoming release.</p> |
| </li> |
| <li> |
| <p>Work continues on the <code>kudu</code> command line tool. Dinesh Bhat added |
| the ability to ask a tablet’s leader to <a href="http://gerrit.cloudera.org:8080/4533">step down</a> |
| and Alexey Serbin added a <a href="http://gerrit.cloudera.org:8080/4412">tool to insert random data into a |
| table</a>.</p> |
| </li> |
| <li> |
| <p>Jordan Birdsell continues to be on a tear improving the Python client. |
| The patches are too numerous to mention, but highlights include Python 3 |
| support as well as near feature parity with the C++ client.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon has been doing some refactoring and cleanup in the Raft |
| consensus implementation. In addition to simplifying and removing code, |
| he committed <a href="https://issues.apache.org/jira/browse/KUDU-1567">KUDU-1567</a>, |
| which improves write performance in many cases by a factor of three |
| or more while also improving stability.</p> |
| </li> |
| <li> |
| <p>Brock Noland is working on support for <a href="https://gerrit.cloudera.org/#/c/4491/">INSERT IGNORE</a> |
| as a first-class part of the Kudu API. Of course this functionality |
| can already be done by simply performing normal inserts and ignoring any |
| resulting errors, but pushing it to the server prevents the server |
| from counting such operations as errors.</p> |
| </li> |
| <li>Congratulations to Ninad Shringarpure for contributing his first patches |
| to Kudu. Ninad contributed two documentation fixes and improved |
| formatting on the Kudu web UI.</li> |
| </ul> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-first edition of the Kudu Weekly Update. Astute |
| readers will notice that the weekly blog posts have been not-so-weekly |
| of late &#8211; in fact, it has been nearly two months since the previous post |
| as I and others have focused on releases, conferences, etc. |
| |
| So, rather than covering just this past week, this post will cover highlights |
| of the progress since the 1.0 release in mid-September. If you&#8217;re interested |
| in learning about progress prior to that release, check the |
| release notes.</summary></entry><entry><title>Apache Kudu at Strata+Hadoop World NYC 2016</title><link href="/2016/09/26/strata-nyc-kudu-talks.html" rel="alternate" type="text/html" title="Apache Kudu at Strata+Hadoop World NYC 2016" /><published>2016-09-26T00:00:00-07:00</published><updated>2016-09-26T00:00:00-07:00</updated><id>/2016/09/26/strata-nyc-kudu-talks</id><content type="html" xml:base="/2016/09/26/strata-nyc-kudu-talks.html"><p>This week in New York, O’Reilly and Cloudera will be hosting Strata+Hadoop World |
| 2016. If you’re interested in Kudu, there will be several opportunities to |
| learn more, both from the open source development team as well as some companies |
| who are already adopting Kudu for their use cases. |
| <!--more--> |
| Here are some of the sessions to check out:</p> |
| |
| <ul> |
| <li> |
| <p><a href="http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52146">Powering real-time analytics on Xfinity using Kudu</a> (Wednesday, 11:20am)</p> |
| |
| <p>Sridhar Alla and Kiran Muglurmath from Comcast will talk about how they’re using |
| Kudu to store hundreds of billions of Set-Top Box (STB) events, performing |
| analytics concurrently with real-time streaming ingest of thousands of events |
| per second.</p> |
| </li> |
| <li> |
| <p><a href="http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52248">Creating real-time, data-centric applications with Impala and Kudu</a> (Wednesday, 2:05pm)</p> |
| |
| <p>Marcel Kornacker and Todd Lipcon will introduce how Impala and Kudu together |
| allow users to build real-time applications that support streaming ingest, |
| random access updates and deletes, and high performance analytic SQL in |
| a single system.</p> |
| </li> |
| <li> |
| <p><a href="http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52168">Streaming cybersecurity into Graph: Accelerating data into Datastax Graph and Blazegraph</a> (Thursday, 1:15pm)</p> |
| |
| <p>Joshua Patterson, Michael Wendt, and Keith Kraus from Accenture Labs will discuss |
| how they have built cybersecurity solutions using graph analytics on top of open |
| source technology like Apache Kafka, Spark, and Flink. They will also touch on |
| why Kudu is becoming an integral part of Accenture’s technology stack.</p> |
| </li> |
| <li> |
| <p><a href="http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52050">How GE analyzes billions of mission-critical events in real time using Apache Apex, Spark, and Kudu</a> (Thursday, 2:05pm)</p> |
| |
| <p>Venkatesh Sivasubramanian and Luis Ramos from GE Digital will discuss how they |
| collect and process real-time IoT data using Apache Apex and Apache Spark, and |
| how they’ve been experimenting with Apache Kudu for time series data storage.</p> |
| </li> |
| <li> |
| <p><a href="http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/51887">Apache Kudu: 1.0 and Beyond</a> (Thursday, 4:35pm)</p> |
| |
| <p>Todd Lipcon from Cloudera will review the new features that were developed between Kudu 0.5 |
| (the first public release one year ago) and Kudu 1.0, released just last week. Additionally, |
| this talk will provide some insight into the upcoming project roadmap for the coming year.</p> |
| </li> |
| </ul> |
| |
| <p>Aside from these organized sessions, word has it that there will be various demos |
| featuring Apache Kudu at the Cloudera and ZoomData vendor booths.</p> |
| |
| <p>If you’re not attending the conference, but still based in NYC, all hope is |
| not lost. Michael Crutcher from Cloudera will be presenting an introduction |
| to Apache Kudu at the <a href="http://www.meetup.com/mysqlnyc/events/233599664/">SQL NYC Meetup</a>. |
| Be sure to RSVP as spots are filling up fast.</p></content><author><name>Todd Lipcon</name></author><summary>This week in New York, O&#8217;Reilly and Cloudera will be hosting Strata+Hadoop World |
| 2016. If you&#8217;re interested in Kudu, there will be several opportunities to |
| learn more, both from the open source development team as well as some companies |
| who are already adopting Kudu for their use cases.</summary></entry><entry><title>Apache Kudu 1.0.0 released</title><link href="/2016/09/20/apache-kudu-1-0-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.0.0 released" /><published>2016-09-20T00:00:00-07:00</published><updated>2016-09-20T00:00:00-07:00</updated><id>/2016/09/20/apache-kudu-1-0-0-released</id><content type="html" xml:base="/2016/09/20/apache-kudu-1-0-0-released.html"><p>The Apache Kudu team is happy to announce the release of Kudu 1.0.0!</p> |
| |
| <p>This latest version adds several new features, including:</p> |
| |
| <!--more--> |
| |
| <ul> |
| <li> |
| <p>Removal of multiversion concurrency control (MVCC) history is now supported. |
| This allows Kudu to reclaim disk space, where previously Kudu would keep a full |
| history of all changes made to a given table since the beginning of time.</p> |
| </li> |
| <li> |
| <p>Most of Kudu’s command line tools have been consolidated under a new |
| top-level <code>kudu</code> tool. This reduces the number of large binaries distributed |
| with Kudu and also includes much-improved help output.</p> |
| </li> |
| <li> |
| <p>Administrative tools including <code>kudu cluster ksck</code> now support running |
| against multi-master Kudu clusters.</p> |
| </li> |
| <li> |
| <p>The C++ client API now supports writing data in <code>AUTO_FLUSH_BACKGROUND</code> mode. |
| This can provide higher throughput for ingest workloads.</p> |
| </li> |
| </ul> |
| |
| <p>This release also includes many bug fixes, optimizations, and other |
| improvements, detailed in the <a href="/releases/1.0.0/docs/release_notes.html">release notes</a>.</p> |
| |
| <ul> |
| <li>Download the <a href="/releases/1.0.0/">Kudu 1.0.0 source release</a></li> |
| <li>Convenience binary artifacts for the Java client and various Java |
| integrations (eg Spark, Flume) are also now available via the ASF Maven |
| repository.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.0.0! |
| |
| This latest version adds several new features, including:</summary></entry><entry><title>Pushing Down Predicate Evaluation in Apache Kudu</title><link href="/2016/09/16/predicate-pushdown.html" rel="alternate" type="text/html" title="Pushing Down Predicate Evaluation in Apache Kudu" /><published>2016-09-16T00:00:00-07:00</published><updated>2016-09-16T00:00:00-07:00</updated><id>/2016/09/16/predicate-pushdown</id><content type="html" xml:base="/2016/09/16/predicate-pushdown.html"><p>I had the pleasure of interning with the Apache Kudu team at Cloudera this |
| summer. This project was my summer contribution to Kudu: a restructuring of the |
| scan path to speed up queries.</p> |
| |
| <!--more--> |
| |
| <h2 id="introduction">Introduction</h2> |
| |
| <p>In Kudu, <em>predicate pushdown</em> refers to the way in which predicates are |
| handled. When a scan is requested, its predicates are passed through the |
| different layers of Kudu’s storage hierarchy, allowing for pruning and other |
| optimizations to happen at each level before reaching the underlying data.</p> |
| |
| <p>While predicates are pushed down, predicate evaluation itself occurs at a fairly |
| high level, precluding the evaluation process from certain data-specific |
| optimizations. These optimizations can make tablet scans an order of magnitude |
| faster, if not more.</p> |
| |
| <h2 id="a-day-in-the-life-of-a-query">A Day in the Life of a Query</h2> |
| |
| <p>Because Kudu is a columnar storage engine, its scan path has a number of |
| optimizations to avoid extraneous reads, copies, and computation. When a query |
| is sent to a tablet server, the server prunes tablets based on the |
| primary key, directing the request to only the tablets that contain the key |
| range of interest. Once at a tablet, only the columns relevant to the query are |
| scanned. Further pruning is done over the primary key, and if the query is |
| predicated on non-key columns, the entire column is scanned. The columns in a |
| tablet are stored as <em>cfiles</em>, which are split into encoded <em>blocks</em>. Once the |
| relevant cfiles are determined, the data are materialized by the block |
| decoders, i.e. their underlying data are decoded and copied into a buffer, |
| which is passed back to the tablet layer. The tablet can then evaluate the |
| predicate on the batch of data and mark which rows should be returned to the |
| client.</p> |
| |
| <p>One of the encoding types I worked very closely with is <em>dictionary encoding</em>, |
| an encoding type for strings that performs particularly well for cfiles that |
| have repeating values. Rather than storing every row’s string, each unique |
| string is assigned a numeric codeword, and the rows are stored numerically on |
| disk. When materializing a dictionary block, all of the numeric data are scanned |
| and all of the corresponding strings are copied and buffered for evaluation. |
| When the vocabulary of a dictionary-encoded cfile gets too large, the blocks |
| begin switching to <em>plain encoding mode</em> to act like <em>plain-encoded</em> blocks.</p> |
| |
| <p>In a plain-encoded block, strings are stored contiguously and the character |
| offsets to the start of each string are stored as a list of integers. When |
| materializing, all of the strings are copied to a buffer for evaluation.</p> |
| |
| <p>Therein lies room for improvement: this predicate evaluation path is the same |
| for all data types and encoding types. Within the tablet, the correct cfiles |
| are determined, the cfiles’ decoders are opened, all of the data are copied to |
| a buffer, and the predicates are evaluated on this buffered data via |
| type-specific comparators. This path is extremely flexible, but because it was |
| designed to be encoding-independent, there is room for improvement.</p> |
| |
| <h2 id="trimming-the-fat">Trimming the Fat</h2> |
| |
| <p>The first step is to allow the decoders access to the predicate. In doing so, |
| each encoding type can specialize its evaluation. Additionally, this puts the |
| decoder in a position where it can determine whether a given row satisfies the |
| query, which in turn, allows the decoders to determine what data gets copied |
| instead of eagerly copying all of its data to get evaluated.</p> |
| |
| <p>Take the case of dictionary-encoded strings as an example. With the existing |
| scan path, not only are all of the strings in a column copied into a buffer, but |
| string comparisons are done on every row. By taking advantage of the fact that |
| the data can be represented as integers, the cost of determining the query |
| results can be greatly reduced. The string comparisons can be swapped out with |
| evaluation based on the codewords, in which case the room for improvement boils |
| down to how to most quickly determine whether or not a given codeword |
| corresponds to a string that satisfies the predicate. Dictionary columns will |
| now use a bitset to store the codewords that match the predicates. It will then |
| scan through the integer-valued data and checks the bitset to determine whether |
| it should copy the corresponding string over.</p> |
| |
| <p>This is great in the best case scenario where a cfile’s vocabulary is small, |
| but when the vocabulary gets too large and the dictionary blocks switch to plain |
| encoding mode, performance is hampered. In this mode, the blocks don’t utilize |
| any dictionary metadata and end up wasting the codeword bitset. That isn’t to |
| say all is lost: the decoders can still evaluate a predicate via string |
| comparison, and the fact that evaluation can still occur at the decoder-level |
| means the eager buffering can still be avoided.</p> |
| |
| <p>Dictionary encoding is a perfect storm in that the decoders can completely |
| evaluate the predicates. This is not the case for most other encoding types, |
| but having decoders support evaluation leaves the door open for other encoding |
| types to extend this idea.</p> |
| |
| <h2 id="performance">Performance</h2> |
| <p>Depending on the dataset and query, predicate pushdown can lead to significant |
| improvements. Tablet scans were timed with datasets consisting of repeated |
| string patterns of tunable length and tunable cardinality.</p> |
| |
| <p><img src="/img/predicate-pushdown/pushdown-10.png" alt="png" class="img-responsive" /> |
| <img src="/img/predicate-pushdown/pushdown-10M.png" alt="png" class="img-responsive" /></p> |
| |
| <p>The above plots show the time taken to completely scan a single tablet, recorded |
| using a dataset of ten million rows of strings with length ten. Predicates were |
| designed to select values out of bounds (Empty), select a single value (Equal, |
| i.e. for cardinality <em>k</em>, this would select 1/<em>k</em> of the dataset), select half |
| of the full range (Half), and select the full range of values (All).</p> |
| |
| <p>With the original evaluation implementation, the tablet must copy and scan |
| through the tablet to determine whether any values match. This means that even |
| when the result set is small, the full column is still copied. This is avoided |
| by pushing down predicates, which only copies as needed, and can be seen in the |
| above queries: those with near-empty result sets (Empty and Equal) have shorter |
| scan times than those with larger result sets (Half and All).</p> |
| |
| <p>Note that for dictionary encoding, given a low cardinality, Kudu can completely |
| rely on the dictionary codewords to evaluate, making the query significantly |
| faster. At higher cardinalities, the dictionaries completely fill up and the |
| blocks fall back on plain encoding. The slower, albeit still improved, |
| performance on the dataset containing 10M unique values reflects this.</p> |
| |
| <p><img src="/img/predicate-pushdown/pushdown-tpch.png" alt="png" class="img-responsive" /></p> |
| |
| <p>Similar predicates were run with the TPC-H dataset, querying on the shipdate |
| column. The full path of a query includes not only the tablet scanning itself, |
| but also RPCs and batched data transfer to the caller as the scan progresses. |
| As such, the times plotted above refer to the average end-to-end time required |
| to scan and return a batch of rows. Regardless of this additional overhead, |
| significant improvements on the scan path still yield substantial improvements |
| to the query performance as a whole.</p> |
| |
| <h2 id="conclusion">Conclusion</h2> |
| |
| <p>Pushing down predicate evaluation in Kudu yielded substantial improvements to |
| the scan path. For dictionary encoding, pushdown can be particularly powerful, |
| and other encoding types are either unaffected or also improved. This change has |
| been pushed to the main branch of Kudu, and relevant commits can be found |
| <a href="https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010">here</a> |
| and |
| <a href="https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da">here</a>.</p> |
| |
| <p>This summer has been a phenomenal learning experience for me, in terms of the |
| tools, the workflow, the datasets, the thought-processes that go into building |
| something at Kudu’s scale. I am extremely thankful for all of the mentoring and |
| support I received, and that I got to be a part of Kudu’s journey from |
| incubating to a Top Level Apache project. I can’t express enough how grateful I |
| am for the amount of support I got from the Kudu team, from the intern |
| coordinators, and from the Cloudera community as a whole.</p></content><author><name>Andrew Wong</name></author><summary>I had the pleasure of interning with the Apache Kudu team at Cloudera this |
| summer. This project was my summer contribution to Kudu: a restructuring of the |
| scan path to speed up queries.</summary></entry></feed> |