| <?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2016-06-24T08:38:41-07:00</updated><id>/</id><entry><title>Master fault tolerance in Kudu 1.0</title><link href="/2016/06/24/multi-master-1-0-0.html" rel="alternate" type="text/html" title="Master fault tolerance in Kudu 1.0" /><published>2016-06-24T00:00:00-07:00</published><updated>2016-06-24T00:00:00-07:00</updated><id>/2016/06/24/multi-master-1-0-0</id><content type="html" xml:base="/2016/06/24/multi-master-1-0-0.html"><p>This blog post describes how the 1.0 release of Apache Kudu (incubating) will |
| support fault tolerance for the Kudu master, finally eliminating Kudu’s last |
| single point of failure.</p> |
| |
| <!--more--> |
| |
| <p>As those of you who follow this blog know by now, replication is a signature |
| feature in Kudu. Replication is used to provide fault tolerance for all loaded |
| data. By implementing the Raft consensus protocol, Kudu guarantees that a tablet |
| replicated <strong>2N+1</strong> times can tolerate up to <strong>N</strong> failures.</p> |
| |
| <p>What you may not know is that Kudu replicates its metadata, too. That is, the |
| Kudu master stores all table and tablet metadata in a single “master” tablet. |
| As a regular Kudu tablet itself, this master tablet may be replicated with |
| Raft. As such, the Kudu master is a special kind of tablet server whose primary |
| job is to host a single replica of the master tablet.</p> |
| |
| <p>When we launched Kudu’s first beta, support for replicated masters had been |
| implemented but was too fragile to be anything but experimental. One of our |
| goals for Kudu’s 1.0 release is to improve replicated master support so that it |
| can be safely enabled in production clusters.</p> |
| |
| <h1 id="how-master-replication-works">How master replication works</h1> |
| |
| <p>To use replicated masters, a Kudu operator must deploy some number of Kudu |
| masters, providing the hostname and port number of each master in the group via |
| the <code>--master_address</code> command line option. For example, each master in a |
| three-node deployment should be started with |
| <code>--master_address=&lt;host1:port1&gt;,&lt;host2:port2&gt;&lt;host3:port3&gt;</code>. In Raft parlance, |
| this group of masters is known as a <em>Raft configuration</em>.</p> |
| |
| <p>At startup, a Raft configuration of masters will hold a leader election and |
| elect one master as the leader. The leader master is responsible for servicing |
| both tablet server heartbeats as well as client requests. The remaining masters |
| are followers: they participate in Raft consensus and replicate writes sent by |
| the leader, but are otherwise idle. Any client requests they receive are |
| rejected. Likewise, all tablet server heartbeats they receive are ignored. If |
| the leader master ever dies or steps down, the remaining replicas hold an |
| election to determine the new leader.</p> |
| |
| <p>All persistent master metadata is stored in the single replicated “master” |
| tablet. Every row in this tablet represents either a table or a tablet. Table |
| records include unique table identifiers, the table’s schema, and other bits of |
| information. Tablet records include a unique identifier, the tablet’s Raft |
| configuration, and other information.</p> |
| |
| <p>What master metadata is replicated?</p> |
| |
| <ol> |
| <li>Table and tablet existence, via <strong>CreateTable()</strong> and <strong>DeleteTable()</strong>. |
| Every new tablet record also includes an initial Raft configuration.</li> |
| <li>Schema changes, via <strong>AlterTable()</strong> and tablet server heartbeats.</li> |
| <li>Tablet server Raft configuration changes, via tablet server heartbeats. |
| These include both the list of Raft peers (may have changed due to |
| under-replication) as well as the current leader (may have changed due to |
| an election).</li> |
| </ol> |
| |
| <p>Scanning the master tablet to service every heartbeat or client request would be |
| slow, so the leader master caches all master metadata in memory. The caches are |
| only updated after a metadata change is successfully replicated; in this way |
| they are always consistent with the on-disk tablet. When a new leader master is |
| elected, it scans the entire master tablet and uses the metadata to rebuild its |
| in-memory caches.</p> |
| |
| <h1 id="communication-with-replicated-masters">Communication with replicated masters</h1> |
| |
| <p>All tablet servers start up with location information for the entire master Raft |
| configuration and will periodically heartbeat to every master. Similarly, |
| clients are also configured with the locations of all masters. Unlike tablet |
| servers, they always communicate with the leader master as follower masters will |
| reject client requests. To do this, clients must determine which master is the |
| leader before sending the first request as well as whenever any request fails |
| with a <code>NOT_THE_LEADER</code> error.</p> |
| |
| <h1 id="remaining-work-for-kudu-10">Remaining work for Kudu 1.0</h1> |
| |
| <p><a href="https://issues.apache.org/jira/browse/KUDU-422">KUDU-422</a> tracks the remaining |
| master replication work. The guts of this feature have been implemented as far |
| back as early 2015; the remaining work has been focused on fixing bugs that |
| manifest only under specific conditions. For example, we’ve observed failures in |
| DDL operations (e.g. <strong>CreateTable()</strong>) that only materialize upon the |
| completion of a master leader election. These failures highlight some of the |
| gaps in our testing regimen: we need a robust stress test that repeatedly |
| performs such operations while holding master leader elections.</p> |
| |
| <p>That said, there is one remaining work item of larger scope: there’s no |
| mechanism with which to perform a Raft configuration change for replicated |
| masters. Such a mechanism would have multiple uses:</p> |
| |
| <ol> |
| <li>Migrating from a single-node master deployment to a fully replicated |
| three-node (or five-node) deployment.</li> |
| <li>Replacing a failed master with a new one.</li> |
| </ol> |
| |
| <p>This is being tracked by |
| <a href="https://issues.apache.org/jira/browse/KUDU-1474">KUDU-1474</a>, and there’s been |
| <a href="http://gerrit.cloudera.org:8080/3393">some discussion</a> around a design, but |
| nothing has been implemented yet. Stay tuned!</p></content><author><name>Adar Dembo</name></author><summary>This blog post describes how the 1.0 release of Apache Kudu (incubating) will |
| support fault tolerance for the Kudu master, finally eliminating Kudu&#8217;s last |
| single point of failure.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 21, 2016</title><link href="/2016/06/21/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 21, 2016" /><published>2016-06-21T00:00:00-07:00</published><updated>2016-06-21T00:00:00-07:00</updated><id>/2016/06/21/weekly-update</id><content type="html" xml:base="/2016/06/21/weekly-update.html"><p>Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Dan Burkert posted a series of patches to <a href="https://gerrit.cloudera.org/#/c/3388/">add support in the Java client</a> |
| for non-covering range partitions. At the same time he improved how that client locates tables by |
| leveraging the tablets cache.</p> |
| </li> |
| <li> |
| <p>In the context of making multi-master reliable in 1.0, Adar Dembo posted a <a href="https://gerrit.cloudera.org/#/c/3393/">design document</a> |
| on how to handle permanent master failures. Currently the master’s code is missing some features |
| like <code>remote bootstrap</code> which makes it possible for a new replica to download a snapshot of the data |
| from the leader replica.</p> |
| </li> |
| <li> |
| <p>Tsuyoshi Ozawa refreshed <a href="https://gerrit.cloudera.org/#/c/2162/">a patch</a> posted in February that |
| makes it easier to get started contributing to Kudu by providing a Dockerfile with the right |
| environment.</p> |
| </li> |
| </ul> |
| |
| <h2 id="on-the-blog">On the blog</h2> |
| |
| <ul> |
| <li>Mike Percy <a href="http://getkudu.io/2016/06/17/raft-consensus-single-node.html">wrote</a> about how Kudu |
| uses Raft consensus on a single node, and some changes we’re making as Kudu is getting more mature.</li> |
| </ul> |
| |
| <p>Want to learn more about a specific topic from this blog post? Shoot an email to the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweet at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Using Raft Consensus on a Single Node</title><link href="/2016/06/17/raft-consensus-single-node.html" rel="alternate" type="text/html" title="Using Raft Consensus on a Single Node" /><published>2016-06-17T00:00:00-07:00</published><updated>2016-06-17T00:00:00-07:00</updated><id>/2016/06/17/raft-consensus-single-node</id><content type="html" xml:base="/2016/06/17/raft-consensus-single-node.html"><p>As Kudu marches toward its 1.0 release, which will include support for |
| multi-master operation, we are working on removing old code that is no longer |
| needed. One such piece of code is called LocalConsensus. Once LocalConsensus is |
| removed, we will be using Raft consensus even on Kudu tables that have a |
| replication factor of 1.</p> |
| |
| <!--more--> |
| |
| <p>Using Raft consensus in single-node cases is important for multi-master |
| support because it will allow people to dynamically increase their Kudu |
| cluster’s existing master server replication factor from 1 to many (3 or 5 are |
| typical).</p> |
| |
| <h1 id="the-consensus-interface">The Consensus interface</h1> |
| |
| <p>In Kudu, the |
| <a href="https://github.com/apache/incubator-kudu/blob/branch-0.9.x/src/kudu/consensus/consensus.h">Consensus</a> |
| interface was created as an abstraction to allow us to build the plumbing |
| around how a consensus implementation would interact with the underlying |
| tablet. We were able to build out this “scaffolding” long before our Raft |
| implementation was complete.</p> |
| |
| <p>The Consensus API has the following main responsibilities:</p> |
| |
| <ol> |
| <li>Support acting as a Raft <code>LEADER</code> and replicate writes to a local |
| write-ahead log (WAL) as well as followers in the Raft configuration. For |
| each operation written to the leader, a Raft implementation must keep track |
| of how many nodes have written a copy of the operation being replicated, and |
| whether or not that constitutes a majority. Once a majority of the nodes |
| have written a copy of the data, it is considered committed.</li> |
| <li>Support acting as a Raft <code>FOLLOWER</code> by accepting writes from the leader and |
| preparing them to be eventually committed.</li> |
| <li>Support voting in and initiating leader elections.</li> |
| <li>Support participating in and initiating configuration changes (such as going |
| from a replication factor of 3 to 4).</li> |
| </ol> |
| |
| <p>The first implementation of the Consensus interface was called LocalConsensus. |
| LocalConsensus only supported acting as a leader of a single-node configuration |
| (hence the name “local”). It could not replicate to followers, participate in |
| elections, or change configurations. These limitations have led us to |
| <a href="https://gerrit.cloudera.org/3350">remove</a> LocalConsensus from the code base |
| entirely.</p> |
| |
| <p>Because Kudu has a full-featured Raft implementation, Kudu’s RaftConsensus |
| supports all of the above functions of the Consensus interface.</p> |
| |
| <h1 id="using-a-single-node-raft-configuration">Using a Single-node Raft configuration</h1> |
| |
| <p>A common question on the Raft mailing lists is: “Is it even possible to use |
| Raft on a single node?” The answer is yes.</p> |
| |
| <p>Fundamentally, Raft works by first electing a leader that is responsible for |
| replicating write operations to the other members of the configuration. In |
| order to elect a leader, Raft requires a (strict) majority of the voters to |
| vote “yes” in an election. When there is only a single eligible node in the |
| configuration, there is no chance of losing the election. Raft specifies that |
| when starting an election, a node must first vote for itself and then contact |
| the rest of the voters to tally their votes. If there is only a single node, no |
| communication is required and an election succeeds instantaneously.</p> |
| |
| <p>So, when does it make sense to use Raft for a single node?</p> |
| |
| <p>It makes sense to do this when you want to allow growing the replication factor |
| in the future. This is something that Kudu needs to support. When deploying |
| Kudu, someone may wish to test it out with limited resources in a small |
| environment. Eventually, they may wish to transition that cluster to be a |
| staging or production environment, which would typically require the fault |
| tolerance achievable with multi-node Raft. Without a consensus implementation |
| that supports configuration changes, there would be no way to gracefully |
| support this. Because single-node Raft supports dynamically adding an |
| additional node to its configuration, it is possible to go from one replica to |
| 2 and then 3 replicas and end up with a fault-tolerant cluster without |
| incurring downtime.</p> |
| |
| <h1 id="more-about-raft">More about Raft</h1> |
| |
| <p>To learn more about how Kudu uses Raft consensus, you may find the relevant |
| <a href="https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/README.md">design docs</a> |
| interesting. In the future, we may also post more articles on the Kudu blog |
| about how Kudu uses Raft to achieve fault tolerance.</p> |
| |
| <p>To learn more about the Raft protocol itself, please see the <a href="https://raft.github.io/">Raft consensus |
| home page</a>. The design of Kudu’s Raft implementation |
| is based on the extended protocol described in Diego Ongaro’s Ph.D. |
| dissertation, which you can find linked from the above web site.</p></content><author><name>Mike Percy</name></author><summary>As Kudu marches toward its 1.0 release, which will include support for |
| multi-master operation, we are working on removing old code that is no longer |
| needed. One such piece of code is called LocalConsensus. Once LocalConsensus is |
| removed, we will be using Raft consensus even on Kudu tables that have a |
| replication factor of 1.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 13, 2016</title><link href="/2016/06/13/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 13, 2016" /><published>2016-06-13T00:00:00-07:00</published><updated>2016-06-13T00:00:00-07:00</updated><id>/2016/06/13/weekly-update</id><content type="html" xml:base="/2016/06/13/weekly-update.html"><p>Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>The IPMC vote for 0.9.0 RC1 passed and Kudu 0.9.0 is now |
| <a href="http://getkudu.io/2016/06/10/apache-kudu-0-9-0-released.html">officially released</a>. Per the |
| lazily agreed-upon <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201602.mbox/%3CCAGpTDNcMBWwX8p+yGKzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw@mail.gmail.com%3E">plan</a>, |
| the next release will be 1.0.0 in about two months.</p> |
| </li> |
| <li> |
| <p>Adar Dembo has been cleaning up and improving the Master process’s code. Last week he |
| <a href="https://gerrit.cloudera.org/#/c/2887/">finished</a> removing the per-tablet replica locations cache.</p> |
| </li> |
| <li> |
| <p>Alexey Serbin contributed his first patch last week by <a href="https://gerrit.cloudera.org/#/c/3360/">fixing</a> |
| most of the unit tests that were failing on OSX.</p> |
| </li> |
| <li> |
| <p>Sameer Abhyankar is nearly finished adding support for “in-list” predicates, |
| follow <a href="https://gerrit.cloudera.org/#/c/2986/">this link</a> to the gerrit |
| review. This will enable specifying predicates in the style of “column IN (list, of, values)”.</p> |
| </li> |
| <li> |
| <p>Mike Percy posted a few patches that remove LocalConsensus for single-node tablets, with the actual |
| removal happening in this <a href="https://gerrit.cloudera.org/#/c/3350/">patch</a>.</p> |
| </li> |
| </ul> |
| |
| <h2 id="slides-and-recordings">Slides and recordings</h2> |
| |
| <ul> |
| <li>Todd Lipcon presented Kudu at Berlin Buzzwords earlier this month. The recording is available |
| <a href="https://berlinbuzzwords.de/session/apache-kudu-incubating-fast-analytics-fast-data">here</a>.</li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) 0.9.0 released</title><link href="/2016/06/10/apache-kudu-0-9-0-released.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) 0.9.0 released" /><published>2016-06-10T00:00:00-07:00</published><updated>2016-06-10T00:00:00-07:00</updated><id>/2016/06/10/apache-kudu-0-9-0-released</id><content type="html" xml:base="/2016/06/10/apache-kudu-0-9-0-released.html"><p>The Apache Kudu (incubating) team is happy to announce the release of Kudu |
| 0.9.0!</p> |
| |
| <p>This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source |
| that doesn’t rely on the MapReduce I/O formats. It also improves Tablet Server |
| restart time as well as write performance under high load. Finally, Kudu now enforces |
| the specification of a partitioning scheme for new tables.</p> |
| |
| <ul> |
| <li>Read the detailed <a href="http://getkudu.io/releases/0.9.0/docs/release_notes.html">Kudu 0.9.0 release notes</a></li> |
| <li>Download the <a href="http://getkudu.io/releases/0.9.0/">Kudu 0.9.0 source release</a></li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>The Apache Kudu (incubating) team is happy to announce the release of Kudu |
| 0.9.0! |
| |
| This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source |
| that doesn&#8217;t rely on the MapReduce I/O formats. It also improves Tablet Server |
| restart time as well as write performance under high load. Finally, Kudu now enforces |
| the specification of a partitioning scheme for new tables. |
| |
| |
| Read the detailed Kudu 0.9.0 release notes |
| Download the Kudu 0.9.0 source release</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 6, 2016</title><link href="/2016/06/06/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 6, 2016" /><published>2016-06-06T00:00:00-07:00</published><updated>2016-06-06T00:00:00-07:00</updated><id>/2016/06/06/weekly-update</id><content type="html" xml:base="/2016/06/06/weekly-update.html"><p>Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans, put up <a href="http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201606.mbox/%3CCAGpTDNduoQM0ktuZc1eW1XeXCcXhvPGftJ%3DLRB8Er5c2dZptvw%40mail.gmail.com%3E">0.9.0 RC1</a> |
| for vote on the dev mailing list and it passed. The Incubator PMC (IPMC) will also need |
| to vote on it before it can officially be released.</p> |
| </li> |
| <li> |
| <p>Mike Percy is working on removing LocalConsensus which is currently used for |
| single node Kudu deployments. We will instead use the Raft consensus implementation |
| with a replication factor of 1. This is to simplify development since we need to maintain two |
| consensus implementations. It will also provide a way to migrate from single node to multi-node |
| deployments. See the discussion in this <a href="http://mail-archives.apache.org/mod_mbox/kudu-dev/201605.mbox/%3CCADXBggeE6RUYchv5fa=J2geHGE8Mw4SOeoi=LjXjdfmYYSqyhQ@mail.gmail.com%3E">dev thread</a>.</p> |
| </li> |
| <li> |
| <p>Zhen Zhang got a patch in for <a href="https://issues.apache.org/jira/browse/KUDU-1444">KUDU-1444</a> |
| that adds resources usage monitoring to scanners in the C++ client. In the future this could |
| be leveraged by systems like Impala to augment the query profiles.</p> |
| </li> |
| <li> |
| <p>Longer term efforts for 1.0 are making good progress. Dan Burkert <a href="https://gerrit.cloudera.org/#/c/3255/">added support</a> |
| in the C++ client for non-covering range partitioned tables, and David Alves has a few |
| patches in for the <a href="https://gerrit.cloudera.org/#/c/2642/">Replay Cache</a>.</p> |
| </li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Default Partitioning Changes Coming in Kudu 0.9</title><link href="/2016/06/02/no-default-partitioning.html" rel="alternate" type="text/html" title="Default Partitioning Changes Coming in Kudu 0.9" /><published>2016-06-02T00:00:00-07:00</published><updated>2016-06-02T00:00:00-07:00</updated><id>/2016/06/02/no-default-partitioning</id><content type="html" xml:base="/2016/06/02/no-default-partitioning.html"><p>The upcoming Apache Kudu (incubating) 0.9 release is changing the default |
| partitioning configuration for new tables. This post will introduce the change, |
| explain the motivations, and show examples of how code can be updated to work |
| with the new release.</p> |
| |
| <!--more--> |
| |
| <p>The most common source of frustration with new Kudu users is the default |
| partitioning behavior when creating new tables. If partitioning is not |
| specified, the Kudu client prior to 0.9 creates tables with a <em>single tablet</em>. |
| Single tablet tables are a Kudu anti-pattern, since they are unable to get the |
| scalability benefit of distributing data across the cluster, and instead keep |
| all data on a single machine.</p> |
| |
| <p>Unfortunately, automatically choosing a better default partitioning |
| configuration for new tables is not simple. In most cases, hash partitioning on |
| the primary key is a better default, but this approach can have its own |
| drawbacks. In particular, it is not clear how many buckets should be used for |
| the new table.</p> |
| |
| <p>Since there is no bullet-proof default and changing the partitioning |
| configuration after table creation is impossible, <a href="https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E">we |
| decided</a> |
| to remove the default altogether. Removing the default is a backwards |
| incompatible change, so it must be done before the 1.0 release. If we later find |
| a better way to create a default partitioning configuration, it should be |
| possible to adopt it in a backwards compatible way. The result of removing the |
| default is that new tables created with the 0.9 client must specify a |
| partitioning configuration, or table creation will fail. You can still create a |
| table with a single tablet, but it must be configured explicitly. These changes |
| only affect new table creation; existing tables, including tables created with |
| default partitioning before the 0.9 release, will continue to work.</p> |
| |
| <p>In most cases updating existing code to explicitly set a partitioning |
| configuration should be simple. The examples below add hash partitioning, but |
| you can also specify range partitioning or a combination of range and hash |
| partitioning. See the <a href="http://getkudu.io/docs/schema_design.html#data-distribution">schema design |
| guide</a> for more |
| advanced configurations.</p> |
| |
| <h1 id="c-client">C++ Client</h1> |
| |
| <p>With the C++ client, creating a new table with hash partitions is as simple as |
| calling <code>KuduTableCreator:add_hash_partitions</code> with the columns to hash and the |
| number of buckets to use:</p> |
| |
| <div class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">unique_ptr</span><span class="o">&lt;</span><span class="n">KuduTableCreator</span><span class="o">&gt;</span> <span class="n">table_creator</span><span class="p">(</span><span class="n">my_client</span><span class="o">-&gt;</span><span class="n">NewTableCreator</span><span class="p">());</span> |
| <span class="n">Status</span> <span class="n">create_status</span> <span class="o">=</span> <span class="n">table_creator</span><span class="o">-&gt;</span><span class="n">table_name</span><span class="p">(</span><span class="s">&quot;my-table&quot;</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="n">my_schema</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">add_hash_partitions</span><span class="p">({</span> <span class="s">&quot;key_column_a&quot;</span><span class="p">,</span> <span class="s">&quot;key_column_b&quot;</span> <span class="p">},</span> <span class="mi">16</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">Create</span><span class="p">();</span> |
| <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">create_status</span><span class="p">.</span><span class="n">ok</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* handle error */</span> <span class="p">}</span></code></pre></div> |
| |
| <h1 id="java-client">Java Client</h1> |
| |
| <p>And similarly, in Java:</p> |
| |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">hashColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;&gt;();</span> |
| <span class="n">hashColumns</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_a&quot;</span><span class="o">);</span> |
| <span class="n">hashColumn</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_b&quot;</span><span class="o">);</span> |
| <span class="n">CreateTableOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CreateTableOptions</span><span class="o">().</span><span class="na">addHashPartitions</span><span class="o">(</span><span class="n">hashColumns</span><span class="o">,</span> <span class="mi">16</span><span class="o">);</span> |
| <span class="n">myClient</span><span class="o">.</span><span class="na">createTable</span><span class="o">(</span><span class="s">&quot;my-table&quot;</span><span class="o">,</span> <span class="n">my_schema</span><span class="o">,</span> <span class="n">options</span><span class="o">);</span></code></pre></div> |
| |
| <p>In the examples above, if the hash partition configuration is omitted the create |
| table operation will fail with the error <code>Table partitioning must be specified |
| using setRangePartitionColumns or addHashPartitions</code>. In the Java client this |
| manifests as a thrown <code>IllegalArgumentException</code>, while in the C++ client it is |
| returned as a <code>Status::InvalidArgument</code>.</p> |
| |
| <h1 id="impala">Impala</h1> |
| |
| <p>When creating Kudu tables with Impala, the formerly optional <code>DISTRIBUTE BY</code> |
| clause is now required:</p> |
| |
| <div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">my_table</span> <span class="p">(</span><span class="n">key_column_a</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">key_column_b</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">other_column</span> <span class="n">STRING</span><span class="p">)</span> |
| <span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">key_column_a</span><span class="p">,</span> <span class="n">key_column_b</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span> |
| <span class="n">TBLPROPERTIES</span><span class="p">(</span> |
| <span class="s1">&#39;storage_handler&#39;</span> <span class="o">=</span> <span class="s1">&#39;com.cloudera.kudu.hive.KuduStorageHandler&#39;</span><span class="p">,</span> |
| <span class="s1">&#39;kudu.table_name&#39;</span> <span class="o">=</span> <span class="s1">&#39;my_table&#39;</span><span class="p">,</span> |
| <span class="s1">&#39;kudu.master_addresses&#39;</span> <span class="o">=</span> <span class="s1">&#39;kudu-master.example.com:7051&#39;</span><span class="p">,</span> |
| <span class="s1">&#39;kudu.key_columns&#39;</span> <span class="o">=</span> <span class="s1">&#39;key_column_a,key_column_b&#39;</span> |
| <span class="p">);</span></code></pre></div></content><author><name>Dan Burkert</name></author><summary>The upcoming Apache Kudu (incubating) 0.9 release is changing the default |
| partitioning configuration for new tables. This post will introduce the change, |
| explain the motivations, and show examples of how code can be updated to work |
| with the new release.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 1, 2016</title><link href="/2016/06/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 1, 2016" /><published>2016-06-01T00:00:00-07:00</published><updated>2016-06-01T00:00:00-07:00</updated><id>/2016/06/01/weekly-update</id><content type="html" xml:base="/2016/06/01/weekly-update.html"><p>Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans, the release manager for 0.9.0, <a href="http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201605.mbox/%3CCAGpTDNe_gV5TTsJQSjx_Q-hSGjK9TesWkyP-k9rnhd0mBtYAYg%40mail.gmail.com%3E">indicated</a> |
| that the release is almost ready and the first release candidate will be put up for vote this |
| week.</p> |
| </li> |
| <li> |
| <p>Dan Burkert pushed <a href="http://gerrit.cloudera.org:8080/3131">a change</a> that disallows default |
| partitioning when creating a new table. This is due to many reports from users experiencing bad |
| performance because their table was created with only one tablet. Kudu will now force users to |
| partition their tables.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon ran YCSB stress tests on a cluster and discovered that compactions were taking hours |
| instead of seconds. He pushed <a href="http://gerrit.cloudera.org:8080/#/c/3221/">a change</a> that solves |
| the issue as part of our <a href="https://issues.apache.org/jira/browse/KUDU-749">general effort</a> to |
| improve performance for zipfian update workloads.</p> |
| </li> |
| <li> |
| <p>Todd also <a href="http://gerrit.cloudera.org:8080/#/c/3186/">changed</a> some flush-related defaults to |
| encourage parallel IO and larger flushes. This is based on his previous work that he documented |
| in this <a href="http://getkudu.io/2016/04/26/ycsb.html">blog post</a>.</p> |
| </li> |
| <li> |
| <p>Will Berkeley made a few improvements last week, but <a href="http://gerrit.cloudera.org:8080/3199">one</a> |
| we’d like to call out is that he removed the Java’s kudu-mapreduce module dependency on Hadoop’s |
| hadoop-common test jar. This solved build issues while also removing a nasty dependency.</p> |
| </li> |
| </ul></content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 23, 2016</title><link href="/2016/05/23/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 23, 2016" /><published>2016-05-23T00:00:00-07:00</published><updated>2016-05-23T00:00:00-07:00</updated><id>/2016/05/23/weekly-update</id><content type="html" xml:base="/2016/05/23/weekly-update.html"><p>Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="kudu-related-podcast">Kudu related podcast</h2> |
| |
| <ul> |
| <li>Two committers, Mike Percy and Dan Burkert, appeared on the |
| <a href="https://developer.ibm.com/tv/apachecon-apache-projects/">IBM New Builders podcast</a> |
| to talk about Apache Kudu, how they got involved, and what sort of |
| workloads it is best suited for.</li> |
| </ul> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Jean-Daniel Cryans is again acting as the release manager for the upcoming |
| 0.9.0 release. The git branch for 0.9 has now been cut, and only bug fixes |
| or small improvements will be committed to that branch between now and the |
| first release candidate.</p> |
| </li> |
| <li> |
| <p>Since Kudu’s initial release, one of the most commonly requested features |
| has been support for the <code>UPSERT</code> operation. <code>UPSERT</code>, known in some other |
| databases as <code>INSERT ... ON DUPLICATE KEY UPDATE</code>. This operation has the |
| semantics of an <code>INSERT</code> if no key already exists with the provided primary |
| key. Otherwise, it replaces the existing row with the new values.</p> |
| |
| <p>This week, several developers collaborated to add support for this operation. |
| Todd Lipcon implemented |
| <a href="http://gerrit.cloudera.org:8080/#/c/3101/">support on the server side</a>, |
| C++ client, and <a href="http://gerrit.cloudera.org:8080/#/c/3128/">Python client</a>. |
| Jean-Daniel Cryans added support in the |
| <a href="http://gerrit.cloudera.org:8080/#/c/3123/">Java client</a>. Ara Ebrahimi |
| and Will Berkeley have started working on |
| <a href="http://gerrit.cloudera.org:8080/#/c/3145/">integrating upsert support into the Flume sink</a>.</p> |
| </li> |
| <li> |
| <p>Mike Percy started working on support for <a href="http://gerrit.cloudera.org:8080/#/c/3135/">basic disk |
| space reservations</a> |
| in the tablet server. This feature will cause the tablet server to stop |
| writing to a disk before it’s full, preventing crashes due to running |
| out of space.</p> |
| </li> |
| <li> |
| <p>Chris George and Andy Grove collaborated on support for <a href="http://gerrit.cloudera.org:8080/#/c/2992/">insertions and |
| updates in the Spark DataSource</a>, |
| and the patch was committed towards the end of the week. Brent Gardner |
| has also been helping with the Spark integration, and fixed an important |
| <a href="https://issues.apache.org/jira/browse/KUDU-1453">connection leak bug</a> |
| in the initial implementation.</p> |
| </li> |
| <li> |
| <p>David Alves worked on reviving a 7-month old patch by Jingkai Yuan which |
| implements a <a href="http://gerrit.cloudera.org:8080/#/c/1210/">integer delta encoding scheme</a> |
| that is meant to be efficient both in terms of CPU and disk space. This |
| encoding scheme is also designed to take advantage of modern CPU instruction sets |
| such as AVX and AVX2.</p> |
| </li> |
| </ul> |
| |
| <h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2> |
| |
| <ul> |
| <li>Ryan Bosshart will be presenting Kudu at the <a href="http://www.meetup.com/DFW-Cloudera-User-Group/events/230547045/">Dallas/Fort Worth |
| Cloudera User Group</a>.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 16, 2016</title><link href="/2016/05/16/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 16, 2016" /><published>2016-05-16T00:00:00-07:00</published><updated>2016-05-16T00:00:00-07:00</updated><id>/2016/05/16/weekly-update</id><content type="html" xml:base="/2016/05/16/weekly-update.html"><p>Welcome to the ninth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</p> |
| |
| <!--more--> |
| |
| <p>If you find this post useful, please let us know by emailing the |
| <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#117;&#100;&#117;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">kudu-user mailing list</a> or |
| tweeting at <a href="https://twitter.com/ApacheKudu">@ApacheKudu</a>. Similarly, if you’re |
| aware of some Kudu news we missed, let us know so we can cover it in |
| a future post.</p> |
| |
| <h2 id="development-discussions-and-code-in-progress">Development discussions and code in progress</h2> |
| |
| <ul> |
| <li> |
| <p>Development and code reviews continued on Sameer Abhyankar’s patch which |
| adds support for pushing down <a href="http://gerrit.cloudera.org:8080/#/c/2986/">‘IN’ predicates</a> |
| to the Kudu tablet servers.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Binglin Chang have been continuing to work on improving throughput |
| for a high throughput random-read use case. Initial profiling indicated that the |
| RPC system was a bottleneck, and patches have started to land which improve |
| the throughput:</p> |
| |
| <p>The largest bottleneck was in the queue which transfers RPC calls from the |
| libev “reactor” threads which perform network IO to the “worker” threads |
| which service the actual requests. Binglin borrowed some ideas from Facebook’s |
| <a href="https://github.com/facebook/folly">folly</a> library, and implemented an |
| <a href="http://gerrit.cloudera.org:8080/#/c/2938/">improved queue</a> |
| which reduces context switches and lock contention while also |
| improving CPU cache locality of the worker threads.</p> |
| |
| <p>Todd identified that the hash function used to map connections to reactor |
| threads was poor, resulting in uneven load distribution across cores. |
| A <a href="http://gerrit.cloudera.org:8080/#/c/2939/">simple patch to change the hashcode implementation</a> |
| improved the distribution substantially.</p> |
| |
| <p>With just these patches, an RPC stress benchmark was improved from about 202K RPCs/second |
| to 768K RPCs/second on a 24-core machine. Further improvements are in flight |
| and under review this week.</p> |
| </li> |
| <li> |
| <p>Zhen Zhang is continuing to focus on adding more visibility into |
| performance and resource usage by adding the ability to propagate various |
| per-operation metrics from the server side back to the client. His latest patch |
| under review <a href="http://gerrit.cloudera.org:8080/#/c/3013/">exposes scanner cache hit rate metrics</a> |
| to the client.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Sarah Jelinek continue to make progress on the |
| implementation of a persistent-memory backed block cache. |
| This week a <a href="http://gerrit.cloudera.org:8080/#/c/2957/">substantial refactor to the block cache interface</a> |
| was committed in preparation for the <a href="http://gerrit.cloudera.org:8080/#/c/2593/">NVM cache itself</a>.</p> |
| </li> |
| <li> |
| <p>Congratulations to Will Berkeley, a new contributor who has been |
| contributing small fixes and improvements such as |
| <a href="http://gerrit.cloudera.org:8080/#/c/3022/">exposing table partitioning information in the master web UI</a>. |
| Thanks, Will!</p> |
| </li> |
| <li> |
| <p>David Alves has been continuing to make progress towards his implementation of |
| the <a href="http://gerrit.cloudera.org:8080/#/c/2642/">Replay Cache</a>. |
| This week, he refactored and cleaned up much of the client code involving |
| error handling and retrying write operations, in preparation to inserting |
| unique identifiers for these and other operations.</p> |
| </li> |
| <li> |
| <p>Chris George has continued to work on the Spark DataSource implementation. |
| In particular, work is progressing on support for <a href="http://gerrit.cloudera.org:8080/#/c/2992/">inserting and updating |
| rows via Spark</a>.</p> |
| </li> |
| <li> |
| <p>Todd Lipcon and Mike Percy both committed improvements which will help speed up |
| startup. Measurements on a cluster where each node stores a few TB of data |
| showed a 3x improvement in startup time.</p> |
| </li> |
| </ul> |
| |
| <h2 id="upcoming-talks-and-meetups">Upcoming talks and meetups</h2> |
| |
| <ul> |
| <li>Mladen Kovacevi will be presenting Kudu at the |
| <a href="http://www.meetup.com/Big-Data-Montreal/events/230879277/?eventId=230879277">Big Data Montreal</a> |
| meetup.</li> |
| </ul></content><author><name>Todd Lipcon</name></author><summary>Welcome to the ninth edition of the Kudu Weekly Update. This weekly blog post |
| covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry></feed> |