<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2016-07-01T10:10:00-07:00</updated><id>/</id><entry><title>Apache Kudu (incubating) Weekly Update June 27, 2016</title><link href="/2016/06/27/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 27, 2016" /><published>2016-06-27T00:00:00-07:00</published><updated>2016-06-27T00:00:00-07:00</updated><id>/2016/06/27/weekly-update</id><content type="html" xml:base="/2016/06/27/weekly-update.html">&lt;p&gt;Welcome to the fifteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Todd Lipcon diagnosed and fixed a &lt;a href=&quot;https://gerrit.cloudera.org/3445&quot;&gt;tricky bug&lt;/a&gt;
which could cause Kudu servers to crash under load. It turned out that the bug
was in a synchronization profiling code path related to the tcmalloc allocator.
This allocator is used in release builds, but can’t be used in instrumented builds
such as
&lt;a href=&quot;http://clang.llvm.org/docs/AddressSanitizer.html&quot;&gt;AddressSanitizer&lt;/a&gt; or
&lt;a href=&quot;http://clang.llvm.org/docs/ThreadSanitizer.html&quot;&gt;ThreadSanitizer&lt;/a&gt;. This made it particularly difficult
to catch. The bug fix will be released in the upcoming 0.9.1 release.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Todd also finished and committed a fix for &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1469&quot;&gt;KUDU-1469&lt;/a&gt;,
a bug in which Kudu’s implementation of Raft consensus could get “stuck” not making
progress replicating operations for a tablet. See the
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/3228/7/src/kudu/integration-tests/raft_consensus-itest.cc&quot;&gt;new integration test case&lt;/a&gt;
for more details on this bug.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mike Percy finished implementing and committed a feature which allows
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/3135/&quot;&gt;reserving disk space for non-Kudu processes&lt;/a&gt;.
This feature causes Kudu to stop allocating new data blocks on a
disk if it is within a user-specified threshold of being full, preventing
possible crashes and allowing for safer collocation of Kudu with other processes
on a cluster.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Will Berkeley finished implementing &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1398&quot;&gt;KUDU-1398&lt;/a&gt;,
a new optimization which reduces the amount of disk space used by
indexing structures in Kudu’s internal storage format. This should
improve storage efficiency for workloads with large keys, and can
also improve write performance by increasing the number of index
entries which can fit in a given amount of cache memory.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;David Alves has completed posting a patch series that implements
exactly-once RPC semantics. The design, as mentioned in previous
blog posts, is described in a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/2642/&quot;&gt;design document&lt;/a&gt;
and the patches can be found in a 10-patch series starting with
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/3190/&quot;&gt;gerrit #3190&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Dan Burkert is continuing working on adding support for
&lt;a href=&quot;https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/non-covering-range-partitions.md&quot;&gt;tables with range partitions that don’t cover the entire key
space&lt;/a&gt;.
This past week, he focused on adding &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3388/&quot;&gt;support in the the Java client&lt;/a&gt;
which also necessitated some serious &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3477/&quot;&gt;refactoring&lt;/a&gt;. These patches
are now under review.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Congratulations to Andrew Wong, a new contributor who committed his
first patches this week. Andrew &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3424/&quot;&gt;improved the build docs for OSX&lt;/a&gt;
and also fixed a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3486/&quot;&gt;crash if the user forgot to specify the master address
in some command line tools&lt;/a&gt;.
Thanks, Andrew!&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The Apache Kudu web site has finished migrating to Apache Software Foundation infrastructure.
The site can now be found at &lt;a href=&quot;http://kudu.incubator.apache.org/&quot;&gt;kudu.incubator.apache.org&lt;/a&gt;.
Existing links will automatically redirect.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;A Kudu 0.9.1 release candidate was posted and passed a
&lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201606.mbox/%3CCADY20s6%3D%2BnKNgvx%3DG_pKupQGiH%2B9ToS53LqExBwWM6vLp-ns9A%40mail.gmail.com%3E&quot;&gt;release vote&lt;/a&gt;
by the Kudu Podling PMC (PPMC).
The release candidate will now be voted upon by the Apache Incubator PMC. If all goes well, we
can expect a release late this week. The release fixes a few critical bugs discovered in 0.9.0.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Chris Mattmann, one of Kudu’s mentors from the Apache Incubator,
started a &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201606.mbox/%3CAD4A858D-403D-4E74-A4F4-DE2F08FB761E%40jpl.nasa.gov%3E&quot;&gt;discussion&lt;/a&gt;
about the project’s graduation to a top-level project (TLP).
Initial responses seem to be positive, so the next step will
be to work on a draft resolution and various stages of
voting.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;on-the-kudu-blog&quot;&gt;On the Kudu blog&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Adar Dembo published a post detailing his recent work on
&lt;a href=&quot;http://kudu.apache.org/2016/06/24/multi-master-1-0-0.html&quot;&gt;master fault tolerance in Kudu 1.0&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the fifteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Master fault tolerance in Kudu 1.0</title><link href="/2016/06/24/multi-master-1-0-0.html" rel="alternate" type="text/html" title="Master fault tolerance in Kudu 1.0" /><published>2016-06-24T00:00:00-07:00</published><updated>2016-06-24T00:00:00-07:00</updated><id>/2016/06/24/multi-master-1-0-0</id><content type="html" xml:base="/2016/06/24/multi-master-1-0-0.html">&lt;p&gt;This blog post describes how the 1.0 release of Apache Kudu (incubating) will
support fault tolerance for the Kudu master, finally eliminating Kudu’s last
single point of failure.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;As those of you who follow this blog know by now, replication is a signature
feature in Kudu. Replication is used to provide fault tolerance for all loaded
data. By implementing the Raft consensus protocol, Kudu guarantees that a tablet
replicated &lt;strong&gt;2N+1&lt;/strong&gt; times can tolerate up to &lt;strong&gt;N&lt;/strong&gt; failures.&lt;/p&gt;

&lt;p&gt;What you may not know is that Kudu replicates its metadata, too. That is, the
Kudu master stores all table and tablet metadata in a single “master” tablet.
As a regular Kudu tablet itself, this master tablet may be replicated with
Raft. As such, the Kudu master is a special kind of tablet server whose primary
job is to host a single replica of the master tablet.&lt;/p&gt;

&lt;p&gt;When we launched Kudu’s first beta, support for replicated masters had been
implemented but was too fragile to be anything but experimental. One of our
goals for Kudu’s 1.0 release is to improve replicated master support so that it
can be safely enabled in production clusters.&lt;/p&gt;

&lt;h1 id=&quot;how-master-replication-works&quot;&gt;How master replication works&lt;/h1&gt;

&lt;p&gt;To use replicated masters, a Kudu operator must deploy some number of Kudu
masters, providing the hostname and port number of each master in the group via
the &lt;code&gt;--master_address&lt;/code&gt; command line option. For example, each master in a
three-node deployment should be started with
&lt;code&gt;--master_address=&amp;lt;host1:port1&amp;gt;,&amp;lt;host2:port2&amp;gt;&amp;lt;host3:port3&amp;gt;&lt;/code&gt;. In Raft parlance,
this group of masters is known as a &lt;em&gt;Raft configuration&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;At startup, a Raft configuration of masters will hold a leader election and
elect one master as the leader. The leader master is responsible for servicing
both tablet server heartbeats as well as client requests. The remaining masters
are followers: they participate in Raft consensus and replicate writes sent by
the leader, but are otherwise idle. Any client requests they receive are
rejected. Likewise, all tablet server heartbeats they receive are ignored. If
the leader master ever dies or steps down, the remaining replicas hold an
election to determine the new leader.&lt;/p&gt;

&lt;p&gt;All persistent master metadata is stored in the single replicated “master”
tablet. Every row in this tablet represents either a table or a tablet. Table
records include unique table identifiers, the table’s schema, and other bits of
information. Tablet records include a unique identifier, the tablet’s Raft
configuration, and other information.&lt;/p&gt;

&lt;p&gt;What master metadata is replicated?&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Table and tablet existence, via &lt;strong&gt;CreateTable()&lt;/strong&gt; and &lt;strong&gt;DeleteTable()&lt;/strong&gt;.
Every new tablet record also includes an initial Raft configuration.&lt;/li&gt;
  &lt;li&gt;Schema changes, via &lt;strong&gt;AlterTable()&lt;/strong&gt; and tablet server heartbeats.&lt;/li&gt;
  &lt;li&gt;Tablet server Raft configuration changes, via tablet server heartbeats.
These include both the list of Raft peers (may have changed due to
under-replication) as well as the current leader (may have changed due to
an election).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scanning the master tablet to service every heartbeat or client request would be
slow, so the leader master caches all master metadata in memory. The caches are
only updated after a metadata change is successfully replicated; in this way
they are always consistent with the on-disk tablet. When a new leader master is
elected, it scans the entire master tablet and uses the metadata to rebuild its
in-memory caches.&lt;/p&gt;

&lt;h1 id=&quot;communication-with-replicated-masters&quot;&gt;Communication with replicated masters&lt;/h1&gt;

&lt;p&gt;All tablet servers start up with location information for the entire master Raft
configuration and will periodically heartbeat to every master. Similarly,
clients are also configured with the locations of all masters. Unlike tablet
servers, they always communicate with the leader master as follower masters will
reject client requests. To do this, clients must determine which master is the
leader before sending the first request as well as whenever any request fails
with a &lt;code&gt;NOT_THE_LEADER&lt;/code&gt; error.&lt;/p&gt;

&lt;h1 id=&quot;remaining-work-for-kudu-10&quot;&gt;Remaining work for Kudu 1.0&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-422&quot;&gt;KUDU-422&lt;/a&gt; tracks the remaining
master replication work. The guts of this feature have been implemented as far
back as early 2015; the remaining work has been focused on fixing bugs that
manifest only under specific conditions. For example, we’ve observed failures in
DDL operations (e.g. &lt;strong&gt;CreateTable()&lt;/strong&gt;) that only materialize upon the
completion of a master leader election. These failures highlight some of the
gaps in our testing regimen: we need a robust stress test that repeatedly
performs such operations while holding master leader elections.&lt;/p&gt;

&lt;p&gt;That said, there is one remaining work item of larger scope: there’s no
mechanism with which to perform a Raft configuration change for replicated
masters. Such a mechanism would have multiple uses:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Migrating from a single-node master deployment to a fully replicated
three-node (or five-node) deployment.&lt;/li&gt;
  &lt;li&gt;Replacing a failed master with a new one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is being tracked by
&lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1474&quot;&gt;KUDU-1474&lt;/a&gt;, and there’s been
&lt;a href=&quot;http://gerrit.cloudera.org:8080/3393&quot;&gt;some discussion&lt;/a&gt; around a design, but
nothing has been implemented yet. Stay tuned!&lt;/p&gt;</content><author><name>Adar Dembo</name></author><summary>This blog post describes how the 1.0 release of Apache Kudu (incubating) will
support fault tolerance for the Kudu master, finally eliminating Kudu&amp;#8217;s last
single point of failure.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 21, 2016</title><link href="/2016/06/21/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 21, 2016" /><published>2016-06-21T00:00:00-07:00</published><updated>2016-06-21T00:00:00-07:00</updated><id>/2016/06/21/weekly-update</id><content type="html" xml:base="/2016/06/21/weekly-update.html">&lt;p&gt;Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Dan Burkert posted a series of patches to &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3388/&quot;&gt;add support in the Java client&lt;/a&gt;
for non-covering range partitions. At the same time he improved how that client locates tables by
leveraging the tablets cache.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;In the context of making multi-master reliable in 1.0, Adar Dembo posted a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3393/&quot;&gt;design document&lt;/a&gt;
on how to handle permanent master failures. Currently the master’s code is missing some features
like &lt;code&gt;remote bootstrap&lt;/code&gt; which makes it possible for a new replica to download a snapshot of the data
from the leader replica.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Tsuyoshi Ozawa refreshed &lt;a href=&quot;https://gerrit.cloudera.org/#/c/2162/&quot;&gt;a patch&lt;/a&gt; posted in February that
makes it easier to get started contributing to Kudu by providing a Dockerfile with the right
environment.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;on-the-blog&quot;&gt;On the blog&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Mike Percy &lt;a href=&quot;http://kudu.apache.org/2016/06/17/raft-consensus-single-node.html&quot;&gt;wrote&lt;/a&gt; about how Kudu
uses Raft consensus on a single node, and some changes we’re making as Kudu is getting more mature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;</content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the fourteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Using Raft Consensus on a Single Node</title><link href="/2016/06/17/raft-consensus-single-node.html" rel="alternate" type="text/html" title="Using Raft Consensus on a Single Node" /><published>2016-06-17T00:00:00-07:00</published><updated>2016-06-17T00:00:00-07:00</updated><id>/2016/06/17/raft-consensus-single-node</id><content type="html" xml:base="/2016/06/17/raft-consensus-single-node.html">&lt;p&gt;As Kudu marches toward its 1.0 release, which will include support for
multi-master operation, we are working on removing old code that is no longer
needed. One such piece of code is called LocalConsensus. Once LocalConsensus is
removed, we will be using Raft consensus even on Kudu tables that have a
replication factor of 1.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Using Raft consensus in single-node cases is important for multi-master
support because it will allow people to dynamically increase their Kudu
cluster’s existing master server replication factor from 1 to many (3 or 5 are
typical).&lt;/p&gt;

&lt;h1 id=&quot;the-consensus-interface&quot;&gt;The Consensus interface&lt;/h1&gt;

&lt;p&gt;In Kudu, the
&lt;a href=&quot;https://github.com/apache/incubator-kudu/blob/branch-0.9.x/src/kudu/consensus/consensus.h&quot;&gt;Consensus&lt;/a&gt;
interface was created as an abstraction to allow us to build the plumbing
around how a consensus implementation would interact with the underlying
tablet. We were able to build out this “scaffolding” long before our Raft
implementation was complete.&lt;/p&gt;

&lt;p&gt;The Consensus API has the following main responsibilities:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Support acting as a Raft &lt;code&gt;LEADER&lt;/code&gt; and replicate writes to a local
write-ahead log (WAL) as well as followers in the Raft configuration. For
each operation written to the leader, a Raft implementation must keep track
of how many nodes have written a copy of the operation being replicated, and
whether or not that constitutes a majority. Once a majority of the nodes
have written a copy of the data, it is considered committed.&lt;/li&gt;
  &lt;li&gt;Support acting as a Raft &lt;code&gt;FOLLOWER&lt;/code&gt; by accepting writes from the leader and
preparing them to be eventually committed.&lt;/li&gt;
  &lt;li&gt;Support voting in and initiating leader elections.&lt;/li&gt;
  &lt;li&gt;Support participating in and initiating configuration changes (such as going
from a replication factor of 3 to 4).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first implementation of the Consensus interface was called LocalConsensus.
LocalConsensus only supported acting as a leader of a single-node configuration
(hence the name “local”). It could not replicate to followers, participate in
elections, or change configurations. These limitations have led us to
&lt;a href=&quot;https://gerrit.cloudera.org/3350&quot;&gt;remove&lt;/a&gt; LocalConsensus from the code base
entirely.&lt;/p&gt;

&lt;p&gt;Because Kudu has a full-featured Raft implementation, Kudu’s RaftConsensus
supports all of the above functions of the Consensus interface.&lt;/p&gt;

&lt;h1 id=&quot;using-a-single-node-raft-configuration&quot;&gt;Using a Single-node Raft configuration&lt;/h1&gt;

&lt;p&gt;A common question on the Raft mailing lists is: “Is it even possible to use
Raft on a single node?” The answer is yes.&lt;/p&gt;

&lt;p&gt;Fundamentally, Raft works by first electing a leader that is responsible for
replicating write operations to the other members of the configuration. In
order to elect a leader, Raft requires a (strict) majority of the voters to
vote “yes” in an election. When there is only a single eligible node in the
configuration, there is no chance of losing the election. Raft specifies that
when starting an election, a node must first vote for itself and then contact
the rest of the voters to tally their votes. If there is only a single node, no
communication is required and an election succeeds instantaneously.&lt;/p&gt;

&lt;p&gt;So, when does it make sense to use Raft for a single node?&lt;/p&gt;

&lt;p&gt;It makes sense to do this when you want to allow growing the replication factor
in the future. This is something that Kudu needs to support. When deploying
Kudu, someone may wish to test it out with limited resources in a small
environment. Eventually, they may wish to transition that cluster to be a
staging or production environment, which would typically require the fault
tolerance achievable with multi-node Raft. Without a consensus implementation
that supports configuration changes, there would be no way to gracefully
support this. Because single-node Raft supports dynamically adding an
additional node to its configuration, it is possible to go from one replica to
2 and then 3 replicas and end up with a fault-tolerant cluster without
incurring downtime.&lt;/p&gt;

&lt;h1 id=&quot;more-about-raft&quot;&gt;More about Raft&lt;/h1&gt;

&lt;p&gt;To learn more about how Kudu uses Raft consensus, you may find the relevant
&lt;a href=&quot;https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/README.md&quot;&gt;design docs&lt;/a&gt;
interesting. In the future, we may also post more articles on the Kudu blog
about how Kudu uses Raft to achieve fault tolerance.&lt;/p&gt;

&lt;p&gt;To learn more about the Raft protocol itself, please see the &lt;a href=&quot;https://raft.github.io/&quot;&gt;Raft consensus
home page&lt;/a&gt;. The design of Kudu’s Raft implementation
is based on the extended protocol described in Diego Ongaro’s Ph.D.
dissertation, which you can find linked from the above web site.&lt;/p&gt;</content><author><name>Mike Percy</name></author><summary>As Kudu marches toward its 1.0 release, which will include support for
multi-master operation, we are working on removing old code that is no longer
needed. One such piece of code is called LocalConsensus. Once LocalConsensus is
removed, we will be using Raft consensus even on Kudu tables that have a
replication factor of 1.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 13, 2016</title><link href="/2016/06/13/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 13, 2016" /><published>2016-06-13T00:00:00-07:00</published><updated>2016-06-13T00:00:00-07:00</updated><id>/2016/06/13/weekly-update</id><content type="html" xml:base="/2016/06/13/weekly-update.html">&lt;p&gt;Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you find this post useful, please let us know by emailing the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweeting at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The IPMC vote for 0.9.0 RC1 passed and Kudu 0.9.0 is now
&lt;a href=&quot;http://kudu.apache.org/2016/06/10/apache-kudu-0-9-0-released.html&quot;&gt;officially released&lt;/a&gt;. Per the
lazily agreed-upon &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201602.mbox/%3CCAGpTDNcMBWwX8p+yGKzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw@mail.gmail.com%3E&quot;&gt;plan&lt;/a&gt;,
the next release will be 1.0.0 in about two months.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Adar Dembo has been cleaning up and improving the Master process’s code. Last week he
&lt;a href=&quot;https://gerrit.cloudera.org/#/c/2887/&quot;&gt;finished&lt;/a&gt; removing the per-tablet replica locations cache.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Alexey Serbin contributed his first patch last week by &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3360/&quot;&gt;fixing&lt;/a&gt;
most of the unit tests that were failing on OSX.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Sameer Abhyankar is nearly finished adding support for “in-list” predicates,
follow &lt;a href=&quot;https://gerrit.cloudera.org/#/c/2986/&quot;&gt;this link&lt;/a&gt; to the gerrit
review. This will enable specifying predicates in the style of “column IN (list, of, values)”.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mike Percy posted a few patches that remove LocalConsensus for single-node tablets, with the actual
removal happening in this &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3350/&quot;&gt;patch&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;slides-and-recordings&quot;&gt;Slides and recordings&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Todd Lipcon presented Kudu at Berlin Buzzwords earlier this month. The recording is available
&lt;a href=&quot;https://berlinbuzzwords.de/session/apache-kudu-incubating-fast-analytics-fast-data&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the thirteenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) 0.9.0 released</title><link href="/2016/06/10/apache-kudu-0-9-0-released.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) 0.9.0 released" /><published>2016-06-10T00:00:00-07:00</published><updated>2016-06-10T00:00:00-07:00</updated><id>/2016/06/10/apache-kudu-0-9-0-released</id><content type="html" xml:base="/2016/06/10/apache-kudu-0-9-0-released.html">&lt;p&gt;The Apache Kudu (incubating) team is happy to announce the release of Kudu
0.9.0!&lt;/p&gt;

&lt;p&gt;This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source
that doesn’t rely on the MapReduce I/O formats. It also improves Tablet Server
restart time as well as write performance under high load. Finally, Kudu now enforces
the specification of a partitioning scheme for new tables.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Read the detailed &lt;a href=&quot;http://kudu.apache.org/releases/0.9.0/docs/release_notes.html&quot;&gt;Kudu 0.9.0 release notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Download the &lt;a href=&quot;http://kudu.apache.org/releases/0.9.0/&quot;&gt;Kudu 0.9.0 source release&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Jean-Daniel Cryans</name></author><summary>The Apache Kudu (incubating) team is happy to announce the release of Kudu
0.9.0!

This latest version adds basic UPSERT functionality and an improved Apache Spark Data Source
that doesn&amp;#8217;t rely on the MapReduce I/O formats. It also improves Tablet Server
restart time as well as write performance under high load. Finally, Kudu now enforces
the specification of a partitioning scheme for new tables.


  Read the detailed Kudu 0.9.0 release notes
  Download the Kudu 0.9.0 source release</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 6, 2016</title><link href="/2016/06/06/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 6, 2016" /><published>2016-06-06T00:00:00-07:00</published><updated>2016-06-06T00:00:00-07:00</updated><id>/2016/06/06/weekly-update</id><content type="html" xml:base="/2016/06/06/weekly-update.html">&lt;p&gt;Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you find this post useful, please let us know by emailing the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweeting at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Jean-Daniel Cryans, put up &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201606.mbox/%3CCAGpTDNduoQM0ktuZc1eW1XeXCcXhvPGftJ%3DLRB8Er5c2dZptvw%40mail.gmail.com%3E&quot;&gt;0.9.0 RC1&lt;/a&gt;
for vote on the dev mailing list and it passed. The Incubator PMC (IPMC) will also need
to vote on it before it can officially be released.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mike Percy is working on removing LocalConsensus which is currently used for
single node Kudu deployments. We will instead use the Raft consensus implementation
with a replication factor of 1. This is to simplify development since we need to maintain two
consensus implementations. It will also provide a way to migrate from single node to multi-node
deployments. See the discussion in this &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201605.mbox/%3CCADXBggeE6RUYchv5fa=J2geHGE8Mw4SOeoi=LjXjdfmYYSqyhQ@mail.gmail.com%3E&quot;&gt;dev thread&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Zhen Zhang got a patch in for &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1444&quot;&gt;KUDU-1444&lt;/a&gt;
that adds resources usage monitoring to scanners in the C++ client. In the future this could
be leveraged by systems like Impala to augment the query profiles.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Longer term efforts for 1.0 are making good progress. Dan Burkert &lt;a href=&quot;https://gerrit.cloudera.org/#/c/3255/&quot;&gt;added support&lt;/a&gt;
in the C++ client for non-covering range partitioned tables, and David Alves has a few
patches in for the &lt;a href=&quot;https://gerrit.cloudera.org/#/c/2642/&quot;&gt;Replay Cache&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;</content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the twelfth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Default Partitioning Changes Coming in Kudu 0.9</title><link href="/2016/06/02/no-default-partitioning.html" rel="alternate" type="text/html" title="Default Partitioning Changes Coming in Kudu 0.9" /><published>2016-06-02T00:00:00-07:00</published><updated>2016-06-02T00:00:00-07:00</updated><id>/2016/06/02/no-default-partitioning</id><content type="html" xml:base="/2016/06/02/no-default-partitioning.html">&lt;p&gt;The upcoming Apache Kudu (incubating) 0.9 release is changing the default
partitioning configuration for new tables. This post will introduce the change,
explain the motivations, and show examples of how code can be updated to work
with the new release.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The most common source of frustration with new Kudu users is the default
partitioning behavior when creating new tables. If partitioning is not
specified, the Kudu client prior to 0.9 creates tables with a &lt;em&gt;single tablet&lt;/em&gt;.
Single tablet tables are a Kudu anti-pattern, since they are unable to get the
scalability benefit of distributing data across the cluster, and instead keep
all data on a single machine.&lt;/p&gt;

&lt;p&gt;Unfortunately, automatically choosing a better default partitioning
configuration for new tables is not simple. In most cases, hash partitioning on
the primary key is a better default, but this approach can have its own
drawbacks. In particular, it is not clear how many buckets should be used for
the new table.&lt;/p&gt;

&lt;p&gt;Since there is no bullet-proof default and changing the partitioning
configuration after table creation is impossible, &lt;a href=&quot;https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E&quot;&gt;we
decided&lt;/a&gt;
to remove the default altogether. Removing the default is a backwards
incompatible change, so it must be done before the 1.0 release. If we later find
a better way to create a default partitioning configuration, it should be
possible to adopt it in a backwards compatible way. The result of removing the
default is that new tables created with the 0.9 client must specify a
partitioning configuration, or table creation will fail. You can still create a
table with a single tablet, but it must be configured explicitly. These changes
only affect new table creation; existing tables, including tables created with
default partitioning before the 0.9 release, will continue to work.&lt;/p&gt;

&lt;p&gt;In most cases updating existing code to explicitly set a partitioning
configuration should be simple. The examples below add hash partitioning, but
you can also specify range partitioning or a combination of range and hash
partitioning. See the &lt;a href=&quot;http://kudu.apache.org/docs/schema_design.html#data-distribution&quot;&gt;schema design
guide&lt;/a&gt; for more
advanced configurations.&lt;/p&gt;

&lt;h1 id=&quot;c-client&quot;&gt;C++ Client&lt;/h1&gt;

&lt;p&gt;With the C++ client, creating a new table with hash partitions is as simple as
calling &lt;code&gt;KuduTableCreator:add_hash_partitions&lt;/code&gt; with the columns to hash and the
number of buckets to use:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-cpp&quot; data-lang=&quot;cpp&quot;&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KuduTableCreator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table_creator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NewTableCreator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Status&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_status&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table_creator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;table_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                                     &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                                     &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_hash_partitions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;key_column_a&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;key_column_b&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                                     &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_status&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ok&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* handle error */&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id=&quot;java-client&quot;&gt;Java Client&lt;/h1&gt;

&lt;p&gt;And similarly, in Java:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hashColumns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hashColumns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;key_column_a&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hashColumn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;key_column_b&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;CreateTableOptions&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;CreateTableOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addHashPartitions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hashColumns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;myClient&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the examples above, if the hash partition configuration is omitted the create
table operation will fail with the error &lt;code&gt;Table partitioning must be specified
using setRangePartitionColumns or addHashPartitions&lt;/code&gt;. In the Java client this
manifests as a thrown &lt;code&gt;IllegalArgumentException&lt;/code&gt;, while in the C++ client it is
returned as a &lt;code&gt;Status::InvalidArgument&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;impala&quot;&gt;Impala&lt;/h1&gt;

&lt;p&gt;When creating Kudu tables with Impala, the formerly optional &lt;code&gt;DISTRIBUTE BY&lt;/code&gt;
clause is now required:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key_column_a&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key_column_b&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other_column&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DISTRIBUTE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HASH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key_column_a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key_column_b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BUCKETS&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TBLPROPERTIES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s1&quot;&gt;&amp;#39;storage_handler&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;com.cloudera.kudu.hive.KuduStorageHandler&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;s1&quot;&gt;&amp;#39;kudu.table_name&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;my_table&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;s1&quot;&gt;&amp;#39;kudu.master_addresses&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;kudu-master.example.com:7051&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;s1&quot;&gt;&amp;#39;kudu.key_columns&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;key_column_a,key_column_b&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content><author><name>Dan Burkert</name></author><summary>The upcoming Apache Kudu (incubating) 0.9 release is changing the default
partitioning configuration for new tables. This post will introduce the change,
explain the motivations, and show examples of how code can be updated to work
with the new release.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update June 1, 2016</title><link href="/2016/06/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update June 1, 2016" /><published>2016-06-01T00:00:00-07:00</published><updated>2016-06-01T00:00:00-07:00</updated><id>/2016/06/01/weekly-update</id><content type="html" xml:base="/2016/06/01/weekly-update.html">&lt;p&gt;Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you find this post useful, please let us know by emailing the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweeting at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Jean-Daniel Cryans, the release manager for 0.9.0, &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201605.mbox/%3CCAGpTDNe_gV5TTsJQSjx_Q-hSGjK9TesWkyP-k9rnhd0mBtYAYg%40mail.gmail.com%3E&quot;&gt;indicated&lt;/a&gt;
that the release is almost ready and the first release candidate will be put up for vote this
week.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Dan Burkert pushed &lt;a href=&quot;http://gerrit.cloudera.org:8080/3131&quot;&gt;a change&lt;/a&gt; that disallows default
partitioning when creating a new table. This is due to many reports from users experiencing bad
performance because their table was created with only one tablet. Kudu will now force users to
partition their tables.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Todd Lipcon ran YCSB stress tests on a cluster and discovered that compactions were taking hours
instead of seconds. He pushed &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3221/&quot;&gt;a change&lt;/a&gt; that solves
the issue as part of our &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-749&quot;&gt;general effort&lt;/a&gt; to
improve performance for zipfian update workloads.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Todd also &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3186/&quot;&gt;changed&lt;/a&gt; some flush-related defaults to
encourage parallel IO and larger flushes. This is based on his previous work that he documented
in this &lt;a href=&quot;http://kudu.apache.org/2016/04/26/ycsb.html&quot;&gt;blog post&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Will Berkeley made a few improvements last week, but &lt;a href=&quot;http://gerrit.cloudera.org:8080/3199&quot;&gt;one&lt;/a&gt;
we’d like to call out is that he removed the Java’s kudu-mapreduce module dependency on Hadoop’s
hadoop-common test jar. This solved build issues while also removing a nasty dependency.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;</content><author><name>Jean-Daniel Cryans</name></author><summary>Welcome to the eleventh edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry><entry><title>Apache Kudu (incubating) Weekly Update May 23, 2016</title><link href="/2016/05/23/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu (incubating) Weekly Update May 23, 2016" /><published>2016-05-23T00:00:00-07:00</published><updated>2016-05-23T00:00:00-07:00</updated><id>/2016/05/23/weekly-update</id><content type="html" xml:base="/2016/05/23/weekly-update.html">&lt;p&gt;Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;If you find this post useful, please let us know by emailing the
&lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#105;&amp;#110;&amp;#099;&amp;#117;&amp;#098;&amp;#097;&amp;#116;&amp;#111;&amp;#114;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
tweeting at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
aware of some Kudu news we missed, let us know so we can cover it in
a future post.&lt;/p&gt;

&lt;h2 id=&quot;kudu-related-podcast&quot;&gt;Kudu related podcast&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Two committers, Mike Percy and Dan Burkert, appeared on the
&lt;a href=&quot;https://developer.ibm.com/tv/apachecon-apache-projects/&quot;&gt;IBM New Builders podcast&lt;/a&gt;
to talk about Apache Kudu, how they got involved, and what sort of
workloads it is best suited for.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Jean-Daniel Cryans is again acting as the release manager for the upcoming
0.9.0 release. The git branch for 0.9 has now been cut, and only bug fixes
or small improvements will be committed to that branch between now and the
first release candidate.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Since Kudu’s initial release, one of the most commonly requested features
has been support for the &lt;code&gt;UPSERT&lt;/code&gt; operation. &lt;code&gt;UPSERT&lt;/code&gt;, known in some other
databases as &lt;code&gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/code&gt;. This operation has the
semantics of an &lt;code&gt;INSERT&lt;/code&gt; if no key already exists with the provided primary
key. Otherwise, it replaces the existing row with the new values.&lt;/p&gt;

    &lt;p&gt;This week, several developers collaborated to add support for this operation.
Todd Lipcon implemented
&lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3101/&quot;&gt;support on the server side&lt;/a&gt;,
C++ client, and &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3128/&quot;&gt;Python client&lt;/a&gt;.
Jean-Daniel Cryans added support in the
&lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3123/&quot;&gt;Java client&lt;/a&gt;. Ara Ebrahimi
and Will Berkeley have started working on
&lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3145/&quot;&gt;integrating upsert support into the Flume sink&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mike Percy started working on support for &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/3135/&quot;&gt;basic disk
space reservations&lt;/a&gt;
in the tablet server. This feature will cause the tablet server to stop
writing to a disk before it’s full, preventing crashes due to running
out of space.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Chris George and Andy Grove collaborated on support for &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/2992/&quot;&gt;insertions and
updates in the Spark DataSource&lt;/a&gt;,
and the patch was committed towards the end of the week. Brent Gardner
has also been helping with the Spark integration, and fixed an important
&lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-1453&quot;&gt;connection leak bug&lt;/a&gt;
in the initial implementation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;David Alves worked on reviving a 7-month old patch by Jingkai Yuan which
implements a &lt;a href=&quot;http://gerrit.cloudera.org:8080/#/c/1210/&quot;&gt;integer delta encoding scheme&lt;/a&gt;
that is meant to be efficient both in terms of CPU and disk space. This
encoding scheme is also designed to take advantage of modern CPU instruction sets
such as AVX and AVX2.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;upcoming-talks-and-meetups&quot;&gt;Upcoming talks and meetups&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Ryan Bosshart will be presenting Kudu at the &lt;a href=&quot;http://www.meetup.com/DFW-Cloudera-User-Group/events/230547045/&quot;&gt;Dallas/Fort Worth
Cloudera User Group&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the tenth edition of the Kudu Weekly Update. This weekly blog post
covers ongoing development and news in the Apache Kudu (incubating) project.</summary></entry></feed>
