| <?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2016-06-22T20:12:13-07:00</updated><id>/</id><entry><title>Apache Kudu (incubating) Weekly Update June 21, 2016</title><link href="/2016/06/21/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 21, 2016" /><published>2016-06-21T00:00:00-07:00</published><updated>2016-06-21T00:00:00-07:00</updated><id>/2016/06/21/weekly-update</id><content type="html" xml:base="/2016/06/21/weekly-update.html"><p>Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Dan Burkert posted a series of patches to <a href="https://gerrit.cloudera.org/#/c/3388/">add support in the Java client</a> |
| for non-covering range partitions. At the same time he improved how that client locates tables by |
| leveraging the tablets cache.</p> |
| </li> |
| <li> |
| <p>In the context of making multi-master reliable in 1.0, Adar Dembo posted a <a href="https://gerrit.cloudera.org/#/c/3393/">design document</a> |
| on how to handle permanent master failures. Currently the master’s code is missing some features |
| like <code>remote bootstrap</code> which makes it possible for a new replica to download a snapshot of the data |
| from the leader replica.</p> |
| </li> |
| <li> |
| <p>Tsuyoshi Ozawa refreshed <a href="https://gerrit.cloudera.org/#/c/2162/">a patch</a> posted in February that |
| makes it easier to get started contributing to Kudu by providing a Dockerfile with the right |
| environment.</p> |
| </li> |
| </ul> |
| |
| <h2 id="on-the-blog">On the blog</h2> |
| |
| <ul> |
| <li>Mike Percy <a href="http://getkudu.io/2016/06/17/raft-consensus-single-node.html">wrote</a> about how Kudu |
| uses Raft consensus on a single node, and some changes we’re making as Kudu is getting more mature.</li> |
| </ul> |
| |
| <!--more--> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project. |
| |
| Development discussions and code in progress |
| |
| |
| |
| Dan Burkert posted a series of patches to add support in the Java client |
| for non-covering range partitions. At the same time he improved how that client locates tables by |
| leveraging the tablets cache. |
| |
| |
| In the context of making multi-master reliable in 1.0, Adar Dembo posted a design document |
| on how to handle permanent master failures. Currently the master&#8217;s code is missing some features |
| like remote bootstrap which makes it possible for a new replica to download a snapshot of the data |
| from the leader replica. |
| |
| |
| Tsuyoshi Ozawa refreshed a patch posted in February that |
| makes it easier to get started contributing to Kudu by providing a Dockerfile with the right |
| environment. |
| |
| |
| |
| On the blog |
| |
| |
| Mike Percy wrote about how Kudu |
| uses Raft consensus on a single node, and some changes we&#8217;re making as Kudu is getting more mature.</summary></entry><entry><title>Using Raft Consensus on a Single Node</title><link href="/2016/06/17/raft-consensus-single-node.html" rel="alternate" type="text/html" title="Using Raft Consensus on a Single Node" /><published>2016-06-17T00:00:00-07:00</published><updated>2016-06-17T00:00:00-07:00</updated><id>/2016/06/17/raft-consensus-single-node</id><content type="html" xml:base="/2016/06/17/raft-consensus-single-node.html"><p>As Kudu marches toward its 1.0 release, which will include support for |
| multi-master operation, we are working on removing old code that is no longer |
| needed. One such piece of code is called LocalConsensus. Once LocalConsensus is |
| removed, we will be using Raft consensus even on Kudu tables that have a |
| replication factor of 1.</p> |
| |
| <!--more--> |
| |
| <p>Using Raft consensus in single-node cases is important for multi-master |
| support because it will allow people to dynamically increase their Kudu |
| cluster’s existing master server replication factor from 1 to many (3 or 5 are |
| typical).</p> |
| |
| <h1 id="the-consensus-interface">The Consensus interface</h1> |
| |
| <p>In Kudu, the |
| <a href="https://github.com/apache/incubator-kudu/blob/branch-0.9.x/src/kudu/consensus/consensus.h">Consensus</a> |
| interface was created as an abstraction to allow us to build the plumbing |
| around how a consensus implementation would interact with the underlying |
| tablet. We were able to build out this “scaffolding” long before our Raft |
| implementation was complete.</p> |
| |
| <p>The Consensus API has the following main responsibilities:</p> |
| |
| <ol> |
| <li>Support acting as a Raft <code>LEADER</code> and replicate writes to a local |
| write-ahead log (WAL) as well as followers in the Raft configuration. For |
| each operation written to the leader, a Raft implementation must keep track |
| of how many nodes have written a copy of the operation being replicated, and |
| whether or not that constitutes a majority. Once a majority of the nodes |
| have written a copy of the data, it is considered committed.</li> |
| <li>Support acting as a Raft <code>FOLLOWER</code> by accepting writes from the leader and |
| preparing them to be eventually committed.</li> |
| <li>Support voting in and initiating leader elections.</li> |
| <li>Support participating in and initiating configuration changes (such as going |
| from a replication factor of 3 to 4).</li> |
| </ol> |
| |
| <p>The first implementation of the Consensus interface was called LocalConsensus. |
| LocalConsensus only supported acting as a leader of a single-node configuration |
| (hence the name “local”). It could not replicate to followers, participate in |
| elections, or change configurations. These limitations have led us to |
| <a href="https://gerrit.cloudera.org/3350">remove</a> LocalConsensus from the code base |
| entirely.</p> |
| |
| <p>Because Kudu has a full-featured Raft implementation, Kudu’s RaftConsensus |
| supports all of the above functions of the Consensus interface.</p> |
| |
| <h1 id="using-a-single-node-raft-configuration">Using a Single-node Raft configuration</h1> |
| |
| <p>A common question on the Raft mailing lists is: “Is it even possible to use |
| Raft on a single node?” The answer is yes.</p> |
| |
| <p>Fundamentally, Raft works by first electing a leader that is responsible for |
| replicating write operations to the other members of the configuration. In |
| order to elect a leader, Raft requires a (strict) majority of the voters to |
| vote “yes” in an election. When there is only a single eligible node in the |
| configuration, there is no chance of losing the election. Raft specifies that |
| when starting an election, a node must first vote for itself and then contact |
| the rest of the voters to tally their votes. If there is only a single node, no |
| communication is required and an election succeeds instantaneously.</p> |
| |
| <p>So, when does it make sense to use Raft for a single node?</p> |
| |
| <p>It makes sense to do this when you want to allow growing the replication factor |
| in the future. This is something that Kudu needs to support. When deploying |
| Kudu, someone may wish to test it out with limited resources in a small |
| environment. Eventually, they may wish to transition that cluster to be a |
| staging or production environment, which would typically require the fault |
| tolerance achievable with multi-node Raft. Without a consensus implementation |
| that supports configuration changes, there would be no way to gracefully |
| support this. Because single-node Raft supports dynamically adding an |
| additional node to its configuration, it is possible to go from one replica to |
| 2 and then 3 replicas and end up with a fault-tolerant cluster without |
| incurring downtime.</p> |
| |
| <h1 id="more-about-raft">More about Raft</h1> |
| |
| <p>To learn more about how Kudu uses Raft consensus, you may find the relevant |
| <a href="https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/README.md">design docs</a> |
| interesting. In the future, we may also post more articles on the Kudu blog |
| about how Kudu uses Raft to achieve fault tolerance.</p> |
| |
| <p>To learn more about the Raft protocol itself, please see the <a href="https://raft.github.io/">Raft consensus |
| home page</a>. The design of Kudu’s Raft implementation |
| is based on the extended protocol described in Diego Ongaro’s Ph.D. |
| dissertation, which you can find linked from the above web site.</p></content><author><name>Mike Percy</name></author><summary>As Kudu marches toward its 1.0 release, which will include support for |
| multi-master operation, we are working on removing old code that is no longer |
| needed. One such piece of code is called LocalConsensus. Once LocalConsensus is |
| removed, we will be using Raft consensus even on Kudu tables that have a |
| replication factor of 1.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 13, 2016</title><link href="/2016/06/13/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 13, 2016" /><published>2016-06-13T00:00:00-07:00</published><updated>2016-06-13T00:00:00-07:00</updated><id>/2016/06/13/weekly-update</id><content type="html" xml:base="/2016/06/13/weekly-update.html"><p>Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>The IPMC vote for 0.9.0 RC1 passed and Kudu 0.9.0 is now |
| <a href="http://getkudu.io/2016/06/10/apache-kudu-0-9-0-released.html">officially released</a>. Per the |
| lazily agreed-upon <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201602.mbox/%3CCAGpTDNcMBWwX8p+yGKzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw@mail.gmail.com%3E">plan</a>, |
| the next release will be 1.0.0 in about two months.</p> |
| </li> |
| <li> |
| <p>Adar Dembo has been cleaning up and improving the Master process’s code. Last week he |
| <a href="https://gerrit.cloudera.org/#/c/2887/">finished</a> removing the per-tablet replica locations cache.</p> |
| </li> |
| <li> |
| <p>Alexey Serbin contributed his first patch last week by <a href="https://gerrit.cloudera.org/#/c/3360/">fixing</a> |
| most of the unit tests that were failing on OSX.</p> |
| </li> |
| <li> |
| <p>Sameer Abhyankar is nearly finished adding support for “in-list” predicates, |
| follow <a href="https://gerrit.cloudera.org/#/c/2986/">this link</a> to the gerrit |
| review. This will enable specifying predicates in the style of “column IN (list, of, values)”.</p> |
| </li> |
| <li> |
| <p>Mike Percy posted a few patches that remove LocalConsensus for single-node tablets, with the actual |
| removal happening in this <a href="https://gerrit.cloudera.org/#/c/3350/">patch</a>.</p> |
| </li> |
| </ul> |
| |
| <h2 id="slides-and-recordings">Slides and recordings</h2> |
| |
| <ul> |
| <li>Todd Lipcon presented Kudu at Berlin Buzzwords earlier this month. The recording is available |
| <a href="https://berlinbuzzwords.de/session/apache-kudu-incubating-fast-analytics-fast-data">here</a>.</li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) 0.9.0 released</title><link href="/2016/06/10/apache-kudu-0-9-0-released.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) 0.9.0 released" /><published>2016-06-10T00:00:00-07:00</published><updated>2016-06-10T00:00:00-07:00</updated><id>/2016/06/10/apache-kudu-0-9-0-released</id><content type="html" xml:base="/2016/06/10/apache-kudu-0-9-0-released.html"><p>The Apache Kudu (incubating) team is happy to announce the release of Kudu |
| 0.9.0!</p> |
| |
| <p>This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source |
| that doesn’t rely on the MapReduce I/O formats. It also improves Tablet Server |
| restart time as well as write performance under high load. Finally, Kudu now enforces |
| the specification of a partitioning scheme for new tables.</p> |
| |
| <ul> |
| <li>Read the detailed <a href="http://getkudu.io/releases/0.9.0/docs/release_notes.html">Kudu 0.9.0 release notes</a></li> |
| <li>Download the <a href="http://getkudu.io/releases/0.9.0/">Kudu 0.9.0 source release</a></li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>The Apache Kudu (incubating) team is happy to announce the release of Kudu |
| 0.9.0! |
| |
| This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source |
| that doesn&#8217;t rely on the MapReduce I/O formats. It also improves Tablet Server |
| restart time as well as write performance under high load. Finally, Kudu now enforces |
| the specification of a partitioning scheme for new tables. |
| |
| |
| Read the detailed Kudu 0.9.0 release notes |
| Download the Kudu 0.9.0 source release</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 6, 2016</title><link href="/2016/06/06/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 6, 2016" /><published>2016-06-06T00:00:00-07:00</published><updated>2016-06-06T00:00:00-07:00</updated><id>/2016/06/06/weekly-update</id><content type="html" xml:base="/2016/06/06/weekly-update.html"><p>Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans, put up <a href="http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201606.mbox/%3CCAGpTDNduoQM0ktuZc1eW1XeXCcXhvPGftJ%3DLRB8Er5c2dZptvw%40mail.gmail.com%3E">0.9.0 RC1</a> |
| for vote on the dev mailing list and it passed. The Incubator PMC (IPMC) will also need |
| to vote on it before it can officially be released.</p> |
| </li> |
| <li> |
| <p>Mike Percy is working on removing LocalConsensus which is currently used for |
| single node Kudu deployments. We will instead use the Raft consensus implementation |
| with a replication factor of 1. This is to simplify development since we need to maintain two |
| consensus implementations. It will also provide a way to migrate from single node to multi-node |
| deployments. See the discussion in this <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201605.mbox/%3CCADXBggeE6RUYchv5fa=J2geHGE8Mw4SOeoi=LjXjdfmYYSqyhQ@mail.gmail.com%3E">dev thread</a>.</p> |
| </li> |
| <li> |
| <p>Zhen Zhang got a patch in for <a href="https://issues.apache.org/jira/browse/KUDU-1444">KUDU-1444</a> |
| that adds resources usage monitoring to scanners in the C++ client. In the future this could |
| be leveraged by systems like Impala to augment the query profiles.</p> |
| </li> |
| <li> |
| <p>Longer term efforts for 1.0 are making good progress. Dan Burkert <a href="https://gerrit.cloudera.org/#/c/3255/">added support</a> |
| in the C++ client for non-covering range partitioned tables, and David Alves has a few |
| patches in for the <a href="https://gerrit.cloudera.org/#/c/2642/">Replay Cache</a>.</p> |
| </li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Default Partitioning Changes Coming in Kudu 0.9</title><link href="/2016/06/02/no-default-partitioning.html" rel="alternate" type="text/html" title="Default Partitioning Changes Coming in Kudu 0.9" /><published>2016-06-02T00:00:00-07:00</published><updated>2016-06-02T00:00:00-07:00</updated><id>/2016/06/02/no-default-partitioning</id><content type="html" xml:base="/2016/06/02/no-default-partitioning.html"><p>The upcoming Apache Kudu (incubating) 0.9 release is changing the default |
| partitioning configuration for new tables. This post will introduce the change, |
| explain the motivations, and show examples of how code can be updated to work |
| with the new release.</p> |
| |
| <!--more--> |
| |
| <p>The most common source of frustration with new Kudu users is the default |
| partitioning behavior when creating new tables. If partitioning is not |
| specified, the Kudu client prior to 0.9 creates tables with a <em>single tablet</em>. |
| Single tablet tables are a Kudu anti-pattern, since they are unable to get the |
| scalability benefit of distributing data across the cluster, and instead keep |
| all data on a single machine.</p> |
| |
| <p>Unfortunately, automatically choosing a better default partitioning |
| configuration for new tables is not simple. In most cases, hash partitioning on |
| the primary key is a better default, but this approach can have its own |
| drawbacks. In particular, it is not clear how many buckets should be used for |
| the new table.</p> |
| |
| <p>Since there is no bullet-proof default and changing the partitioning |
| configuration after table creation is impossible, <a href="https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E">we |
| decided</a> |
| to remove the default altogether. Removing the default is a backwards |
| incompatible change, so it must be done before the 1.0 release. If we later find |
| a better way to create a default partitioning configuration, it should be |
| possible to adopt it in a backwards compatible way. The result of removing the |
| default is that new tables created with the 0.9 client must specify a |
| partitioning configuration, or table creation will fail. You can still create a |
| table with a single tablet, but it must be configured explicitly. These changes |
| only affect new table creation; existing tables, including tables created with |
| default partitioning before the 0.9 release, will continue to work.</p> |
| |
| <p>In most cases updating existing code to explicitly set a partitioning |
| configuration should be simple. The examples below add hash partitioning, but |
| you can also specify range partitioning or a combination of range and hash |
| partitioning. See the <a href="http://getkudu.io/docs/schema_design.html#data-distribution">schema design |
| guide</a> for more |
| advanced configurations.</p> |
| |
| <h1 id="c-client">C++ Client</h1> |
| |
| <p>With the C++ client, creating a new table with hash partitions is as simple as |
| calling <code>KuduTableCreator:add_hash_partitions</code> with the columns to hash and the |
| number of buckets to use:</p> |
| |
| <p><code>cpp |
| unique_ptr&lt;KuduTableCreator&gt; table_creator(my_client-&gt;NewTableCreator()); |
| Status create_status = table_creator-&gt;table_name("my-table") |
| .schema(my_schema) |
| .add_hash_partitions({ "key_column_a", "key_column_b" }, 16) |
| .Create(); |
| if (!create_status.ok() { /* handle error */ } |
| </code></p> |
| |
| <h1 id="java-client">Java Client</h1> |
| |
| <p>And similarly, in Java:</p> |
| |
| <p><code>java |
| List&lt;String&gt; hashColumns = new ArrayList&lt;&gt;(); |
| hashColumns.add("key_column_a"); |
| hashColumn.add("key_column_b"); |
| CreateTableOptions options = new CreateTableOptions().addHashPartitions(hashColumns, 16); |
| myClient.createTable("my-table", my_schema, options); |
| </code></p> |
| |
| <p>In the examples above, if the hash partition configuration is omitted the create |
| table operation will fail with the error <code>Table partitioning must be specified |
| using setRangePartitionColumns or addHashPartitions</code>. In the Java client this |
| manifests as a thrown <code>IllegalArgumentException</code>, while in the C++ client it is |
| returned as a <code>Status::InvalidArgument</code>.</p> |
| |
| <h1 id="impala">Impala</h1> |
| |
| <p>When creating Kudu tables with Impala, the formerly optional <code>DISTRIBUTE BY</code> |
| clause is now required:</p> |
| |
| <p><code>SQL |
| CREATE TABLE my_table (key_column_a STRING, key_column_b STRING, other_column STRING) |
| DISTRIBUTE BY HASH (key_column_a, key_column_b) INTO 16 BUCKETS |
| TBLPROPERTIES( |
| 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', |
| 'kudu.table_name' = 'my_table', |
| 'kudu.master_addresses' = 'kudu-master.example.com:7051', |
| 'kudu.key_columns' = 'key_column_a,key_column_b' |
| ); |
| </code></p></content><author><name>Dan Burkert</name></author><summary>The upcoming Apache Kudu (incubating) 0.9 release is changing the default |
| partitioning configuration for new tables. This post will introduce the change, |
| explain the motivations, and show examples of how code can be updated to work |
| with the new release.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 1, 2016</title><link href="/2016/06/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 1, 2016" /><published>2016-06-01T00:00:00-07:00</published><updated>2016-06-01T00:00:00-07:00</updated><id>/2016/06/01/weekly-update</id><content type="html" xml:base="/2016/06/01/weekly-update.html"><p>Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans, the release manager for 0.9.0, <a href="http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201605.mbox/%3CCAGpTDNe_gV5TTsJQSjx_Q-hSGjK9TesWkyP-k9rnhd0mBtYAYg%40mail.gmail.com%3E">indicated</a> |
| that the release is almost ready and the first release candidate will be put up for vote this |
| week.</p> |
| </li> |
| <li> |
| <p>Dan Burkert pushed <a href="http://gerrit.cloudera.org:8080/3131">a change</a> that disallows default |
| partitioning when creating a new table. This is due to many reports from users experiencing bad |
| performance because their table was created with only one tablet. Kudu will now force users to |
| partition their tables.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon ran YCSB stress tests on a cluster and discovered that compactions were taking hours |
| instead of seconds. He pushed <a href="http://gerrit.cloudera.org:8080/#/c/3221/">a change</a> that solves |
| the issue as part of our <a href="https://issues.apache.org/jira/browse/KUDU-749">general effort</a> to |
| improve performance for zipfian update workloads.</p> |
| </li> |
| <li> |
| <p>Todd also <a href="http://gerrit.cloudera.org:8080/#/c/3186/">changed</a> some flush-related defaults to |
| encourage parallel IO and larger flushes. This is based on his previous work that he documented |
| in this <a href="http://getkudu.io/2016/04/26/ycsb.html">blog post</a>.</p> |
| </li> |
| <li> |
| <p>Will Berkeley made a few improvements last week, but <a href="http://gerrit.cloudera.org:8080/3199">one</a> |
| we’d like to call out is that he removed the Java’s kudu-mapreduce module dependency on Hadoop’s |
| hadoop-common test jar. This solved build issues while also removing a nasty dependency.</p> |
| </li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 23, 2016</title><link href="/2016/05/23/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 23, 2016" /><published>2016-05-23T00:00:00-07:00</published><updated>2016-05-23T00:00:00-07:00</updated><id>/2016/05/23/weekly-update</id><content type="html" xml:base="/2016/05/23/weekly-update.html"><p>Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="kudu-related-podcast">Kudu related podcast</h2> |
| |
| <ul> |
| <li>Two committers, Mike Percy and Dan Burkert, appeared on the |
| <a href="https://developer.ibm.com/tv/apachecon-apache-projects/">IBM New Builders podcast</a> |
| to talk about Apache Kudu, how they got involved, and what sort of |
| workloads it is best suited for.</li> |
| </ul> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans is again acting as the release manager for the upcoming |
| 0.9.0 release. The git branch for 0.9 has now been cut, and only bug fixes |
| or small improvements will be committed to that branch between now and the |
| first release candidate.</p> |
| </li> |
| <li> |
| <p>Since Kudu’s initial release, one of the most commonly requested features |
| has been support for the <code>UPSERT</code> operation. <code>UPSERT</code>, known in some other |
| databases as <code>INSERT ... ON DUPLICATE KEY UPDATE</code>. This operation has the |
| semantics of an <code>INSERT</code> if no key already exists with the provided primary |
| key. Otherwise, it replaces the existing row with the new values.</p> |
| |
| <p>This week, several developers collaborated to add support for this operation. |
| Todd Lipcon implemented |
| <a href="http://gerrit.cloudera.org:8080/#/c/3101/">support on the server side</a>, |
| C++ client, and <a href="http://gerrit.cloudera.org:8080/#/c/3128/">Python client</a>. |
| Jean-Daniel Cryans added support in the |
| <a href="http://gerrit.cloudera.org:8080/#/c/3123/">Java client</a>. Ara Ebrahimi |
| and Will Berkeley have started working on |
| <a href="http://gerrit.cloudera.org:8080/#/c/3145/">integrating upsert support into the Flume sink</a>.</p> |
| </li> |
| <li> |
| <p>Mike Percy started working on support for <a href="http://gerrit.cloudera.org:8080/#/c/3135/">basic disk |
| space reservations</a> |
| in the tablet server. This feature will cause the tablet server to stop |
| writing to a disk before it’s full, preventing crashes due to running |
| out of space.</p> |
| </li> |
| <li> |
| <p>Chris George and Andy Grove collaborated on support for <a href="http://gerrit.cloudera.org:8080/#/c/2992/">insertions and |
| updates in the Spark DataSource</a>, |
| and the patch was committed towards the end of the week. Brent Gardner |
| has also been helping with the Spark integration, and fixed an important |
| <a href="https://issues.apache.org/jira/browse/KUDU-1453">connection leak bug</a> |
| in the initial implementation.</p> |
| </li> |
| <li> |
| <p>David Alves worked on reviving a 7-month old patch by Jingkai Yuan which |
| implements a <a href="http://gerrit.cloudera.org:8080/#/c/1210/">integer delta encoding scheme</a> |
| that is meant to be efficient both in terms of CPU and disk space. This |
| encoding scheme is also designed to take advantage of modern CPU instruction sets |
| such as AVX and AVX2.</p> |
| </li> |
| </ul> |
| |
| <h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2> |
| |
| <ul> |
| <li>Ryan Bosshart will be presenting Kudu at the <a href="http://www.meetup.com/DFW-Cloudera-User-Group/events/230547045/">Dallas/Fort Worth |
| Cloudera User Group</a>.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 16, 2016</title><link href="/2016/05/16/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 16, 2016" /><published>2016-05-16T00:00:00-07:00</published><updated>2016-05-16T00:00:00-07:00</updated><id>/2016/05/16/weekly-update</id><content type="html" xml:base="/2016/05/16/weekly-update.html"><p>Welcome to the ninth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Development and code reviews continued on Sameer Abhyankar’s patch which |
| adds support for pushing down <a href="http://gerrit.cloudera.org:8080/#/c/2986/">‘IN’ predicates</a> |
| to the Kudu tablet servers.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Binglin Chang have been continuing to work on improving throughput |
| for a high throughput random-read use case. Initial profiling indicated that the |
| RPC system was a bottleneck, and patches have started to land which improve |
| the throughput:</p> |
| |
| <p>The largest bottleneck was in the queue which transfers RPC calls from the |
| libev “reactor” threads which perform network IO to the “worker” threads |
| which service the actual requests. Binglin borrowed some ideas from Facebook’s |
| <a href="https://github.com/facebook/folly">folly</a> library, and implemented an |
| <a href="http://gerrit.cloudera.org:8080/#/c/2938/">improved queue</a> |
| which reduces context switches and lock contention while also |
| improving CPU cache locality of the worker threads.</p> |
| |
| <p>Todd identified that the hash function used to map connections to reactor |
| threads was poor, resulting in uneven load distribution across cores. |
| A <a href="http://gerrit.cloudera.org:8080/#/c/2939/">simple patch to change the hashcode implementation</a> |
| improved the distribution substantially.</p> |
| |
| <p>With just these patches, an RPC stress benchmark was improved from about 202K RPCs/second |
| to 768K RPCs/second on a 24-core machine. Further improvements are in flight |
| and under review this week.</p> |
| </li> |
| <li> |
| <p>Zhen Zhang is continuing to focus on adding more visibility into |
| performance and resource usage by adding the ability to propagate various |
| per-operation metrics from the server side back to the client. His latest patch |
| under review <a href="http://gerrit.cloudera.org:8080/#/c/3013/">exposes scanner cache hit rate metrics</a> |
| to the client.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Sarah Jelinek continue to make progress on the |
| implementation of a persistent-memory backed block cache. |
| This week a <a href="http://gerrit.cloudera.org:8080/#/c/2957/">substantial refactor to the block cache interface</a> |
| was committed in preparation for the <a href="http://gerrit.cloudera.org:8080/#/c/2593/">NVM cache itself</a>.</p> |
| </li> |
| <li> |
| <p>Congratulations to Will Berkeley, a new contributor who has been |
| contributing small fixes and improvements such as |
| <a href="http://gerrit.cloudera.org:8080/#/c/3022/">exposing table partitioning information in the master web UI</a>. |
| Thanks, Will!</p> |
| </li> |
| <li> |
| <p>David Alves has been continuing to make progress towards his implementation of |
| the <a href="http://gerrit.cloudera.org:8080/#/c/2642/">Replay Cache</a>. |
| This week, he refactored and cleaned up much of the client code involving |
| error handling and retrying write operations, in preparation to inserting |
| unique identifiers for these and other operations.</p> |
| </li> |
| <li> |
| <p>Chris George has continued to work on the Spark DataSource implementation. |
| In particular, work is progressing on support for <a href="http://gerrit.cloudera.org:8080/#/c/2992/">inserting and updating |
| rows via Spark</a>.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Mike Percy both committed improvements which will help speed up |
| startup. Measurements on a cluster where each node stores a few TB of data |
| showed a 3x improvement in startup time.</p> |
| </li> |
| </ul> |
| |
| <h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2> |
| |
| <ul> |
| <li>Mladen Kovacevi will be presenting Kudu at the |
| <a href="http://www.meetup.com/Big-Data-Montreal/events/230879277/?eventId=230879277">Big Data Montreal</a> |
| meetup.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>Welcome to the ninth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 9, 2016</title><link href="/2016/05/09/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 9, 2016" /><published>2016-05-09T00:00:00-07:00</published><updated>2016-05-09T00:00:00-07:00</updated><id>/2016/05/09/weekly-update</id><content type="html" xml:base="/2016/05/09/weekly-update.html"><p>Welcome to the eighth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Sameer Abhyankar posted a <a href="http://gerrit.cloudera.org:8080/#/c/2986/">patch</a> |
| for <a href="https://issues.apache.org/jira/browse/KUDU-1363">KUDU-1363</a> |
| that adds the ability to specify |
| <a href="http://www.w3schools.com/sql/sql_in.asp">IN</a>-like predicates on column values.</p> |
| </li> |
| <li> |
| <p>Chris George and Andy Grove have both been adding new features in Kudu’s |
| Spark module such as methods to <a href="http://gerrit.cloudera.org:8080/#/c/2981/">create/delete tables</a> |
| and <a href="http://gerrit.cloudera.org:8080/#/c/2992/">insert/update rows in the DataSource</a>.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon <a href="https://issues.apache.org/jira/browse/KUDU-1437">fixed a bug</a> in RLE |
| encoding that was reported by Przemyslaw Maciolek. Thank you Przemyslaw for |
| reporting it and providing an easy way to reproduce it!</p> |
| </li> |
| <li> |
| <p>Adar Dembo is currently working on addressing the |
| <a href="https://github.com/cloudera/kudu/blob/master/docs/design-docs/multi-master-1.0.md">issues with multi-master</a> |
| and early last week he got <a href="http://gerrit.cloudera.org:8080/2879">a</a> |
| <a href="http://gerrit.cloudera.org:8080/2928">few</a> <a href="http://gerrit.cloudera.org:8080/2891">patches</a> |
| in that address some race conditions.</p> |
| </li> |
| <li> |
| <p>Zhen Zhang got <a href="http://gerrit.cloudera.org:8080/#/c/2858/">a first contribution</a> |
| in with a patch that adds statistics in the Java client. In 0.9.0 it will be |
| possible to query the client to get things like the number of bytes written or |
| how many write operations were sent.</p> |
| </li> |
| </ul> |
| |
| <h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2> |
| |
| <ul> |
| <li>Dan Burkert and Mike Percy will present Kudu at the |
| <a href="http://www.meetup.com/Vancouver-Spark/events/229692936/">Vancouver Spark Meetup</a> in Vancouver, BC, |
| on May 10.</li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the eighth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry></feed> |