feed.xml - kudu-site - Git at Google

 <?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="2.5.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-03-22T14:30:08-07:00</updated><id>/</id><entry><title>Apache Kudu 1.6.0 released</title><link href="/2017/12/08/apache-kudu-1-6-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.6.0 released" /><published>2017-12-08T00:00:00-08:00</published><updated>2017-12-08T00:00:00-08:00</updated><id>/2017/12/08/apache-kudu-1-6-0-released</id><content type="html" xml:base="/2017/12/08/apache-kudu-1-6-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.6.0!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.6.0 is a minor release that offers new features, performance
 optimizations, incremental improvements, and bug fixes.&lt;/p&gt;

 &lt;p&gt;Release highlights:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ol&gt;
   &lt;li&gt;Kudu servers can now tolerate short interruptions in NTP clock
 synchronization. NTP synchronization is still required when any Kudu daemon
 starts up.&lt;/li&gt;
   &lt;li&gt;Tablet servers will no longer crash when a disk containing data blocks
 fails, unless that disk also stores WAL segments or tablet metadata. Instead
 of crashing, the tablet server will shut down any tablets that may have lost
 data locally and Kudu will re-replicate the affected tablets to another
 tablet server. More information can be found in the documentation under
 &lt;a href=&quot;/releases/1.6.0/docs/administration.html#disk_failure_recovery&quot;&gt;Recovering from Disk Failure&lt;/a&gt;.&lt;/li&gt;
   &lt;li&gt;Tablet server startup time has been improved significantly on servers
 containing large numbers of blocks.&lt;/li&gt;
   &lt;li&gt;The Spark DataSource integration now can take advantage of scan locality for
 better scan performance. The scan will take place at the closest replica
 instead of going to the leader.&lt;/li&gt;
   &lt;li&gt;Support for Spark 1 has been removed in Kudu 1.6.0 and now only Spark 2 is
 supported. Spark 1 support was deprecated in Kudu 1.5.0.&lt;/li&gt;
   &lt;li&gt;HybridTime timestamp propagation now works in the Java client when using
 scan tokens.&lt;/li&gt;
   &lt;li&gt;Tablet servers now consider the health of all replicas of a tablet before
 deciding to evict one. This can improve the stability of the Kudu cluster
 when multiple servers temporarily go down at the same time.&lt;/li&gt;
   &lt;li&gt;A bug in the C++ client was fixed that could cause tablets to be erroneously
 pruned, or skipped, during certain scans, resulting in fewer results than
 expected being returned from queries. The bug only affected tables whose
 range partition columns are a proper prefix of the primary key.
 See &lt;a href=&quot;https://issues.apache.org/jira/browse/KUDU-2173&quot;&gt;KUDU-2173&lt;/a&gt; for more
 information.&lt;/li&gt;
 &lt;/ol&gt;

 &lt;p&gt;For more details, and the complete list of changes in Kudu 1.6.0, please see
 the &lt;a href=&quot;/releases/1.6.0/docs/release_notes.html&quot;&gt;Kudu 1.6.0 release notes&lt;/a&gt;.&lt;/p&gt;

 &lt;p&gt;The Apache Kudu project only publishes source code releases. To build Kudu
 1.6.0, follow these steps:&lt;/p&gt;

 &lt;ol&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.6.0/&quot;&gt;Kudu 1.6.0 source release&lt;/a&gt;.&lt;/li&gt;
   &lt;li&gt;Follow the instructions in the documentation to
 &lt;a href=&quot;/releases/1.6.0/docs/installation.html#build_from_source&quot;&gt;build Kudu 1.6.0 from source&lt;/a&gt;.&lt;/li&gt;
 &lt;/ol&gt;

 &lt;p&gt;For your convenience, binary JAR files for the Kudu Java client library, Spark
 DataSource, Flume sink, and other Java integrations are published to the ASF
 Maven repository and are
 &lt;a href=&quot;https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.kudu%22%20AND%20v%3A%221.6.0%22&quot;&gt;now available&lt;/a&gt;.&lt;/p&gt;</content><author><name>Mike Percy</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.6.0!

 Apache Kudu 1.6.0 is a minor release that offers new features, performance
 optimizations, incremental improvements, and bug fixes.

 Release highlights:</summary></entry><entry><title>Slides: A brave new world in mutable big data: Relational storage</title><link href="/2017/10/23/nosql-kudu-spanner-slides.html" rel="alternate" type="text/html" title="Slides: A brave new world in mutable big data: Relational storage" /><published>2017-10-23T00:00:00-07:00</published><updated>2017-10-23T00:00:00-07:00</updated><id>/2017/10/23/nosql-kudu-spanner-slides</id><content type="html" xml:base="/2017/10/23/nosql-kudu-spanner-slides.html">&lt;p&gt;Since the Apache Kudu project made its debut in 2015, there have been
 a few common questions that kept coming up at every presentation:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Is Kudu an open source version of Google’s Spanner system?&lt;/li&gt;
   &lt;li&gt;Is Kudu NoSQL or SQL?&lt;/li&gt;
   &lt;li&gt;Why does Kudu have a relational data model? Isn’t SQL dead?&lt;/li&gt;
 &lt;/ul&gt;

 &lt;!--more--&gt;

 &lt;p&gt;A few of these questions are addressed in the
 &lt;a href=&quot;https://kudu.apache.org/faq.html&quot;&gt;Kudu FAQ&lt;/a&gt;, but I thought they were
 interesting enough that I decided to give a talk on these subjects
 at &lt;a href=&quot;https://conferences.oreilly.com/strata/strata-ny&quot;&gt;Strata Data Conference NYC 2017&lt;/a&gt;.&lt;/p&gt;

 &lt;p&gt;Preparing this talk was particularly interesting, since Google recently released
 Spanner to the public in SaaS form as &lt;a href=&quot;https://cloud.google.com/spanner/&quot;&gt;Google Cloud Spanner&lt;/a&gt;.
 This meant that I was able to compare Kudu vs Spanner not just qualitatively
 based on some academic papers, but quantitatively as well.&lt;/p&gt;

 &lt;p&gt;To summarize the key points of the presentation:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;Despite the growing popularity of “NoSQL” from 2009 through 2013, SQL has
 once again become the access mechanism of choice for the majority of
 analytic applications. NoSQL has become “Not Only SQL”.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Spanner and Kudu share a lot of common features. However:&lt;/p&gt;

     &lt;ul&gt;
       &lt;li&gt;
         &lt;p&gt;Spanner offers a superior feature set and performance for Online
  Transactional Processing (OLTP) workloads, including ACID transactions and
  secondary indexing.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Kudu offers a superior feature set and performance for Online
  Analytical Processing (OLAP) and Hybrid Transactional/Analytic Processing
  (HTAP) workloads, including more complete SQL support and orders of
  magnitude better performance on large queries.&lt;/p&gt;
       &lt;/li&gt;
     &lt;/ul&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;For more details and for the full benchmark numbers, check out the slide deck
 below:&lt;/p&gt;

 &lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/loQpO2vzlwGGgz&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen=&quot;&quot;&gt; &lt;/iframe&gt;
 &lt;div style=&quot;margin-bottom:15px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/ToddLipcon/a-brave-new-world-in-mutable-big-data-relational-storage-strata-nyc-2017&quot; title=&quot;A brave new world in mutable big data relational storage (Strata NYC 2017)&quot; target=&quot;_blank&quot;&gt;A brave new world in mutable big data relational storage (Strata NYC 2017)&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/ToddLipcon&quot; target=&quot;_blank&quot;&gt;Todd Lipcon&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

 &lt;p&gt;Questions or comments? Join the &lt;a href=&quot;/community.html&quot;&gt;Apache Kudu Community&lt;/a&gt; to discuss.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Since the Apache Kudu project made its debut in 2015, there have been
 a few common questions that kept coming up at every presentation:


   Is Kudu an open source version of Google’s Spanner system?
   Is Kudu NoSQL or SQL?
   Why does Kudu have a relational data model? Isn’t SQL dead?</summary></entry><entry><title>Consistency in Apache Kudu, Part 1</title><link href="/2017/09/18/kudu-consistency-pt1.html" rel="alternate" type="text/html" title="Consistency in Apache Kudu, Part 1" /><published>2017-09-18T00:00:00-07:00</published><updated>2017-09-18T00:00:00-07:00</updated><id>/2017/09/18/kudu-consistency-pt1</id><content type="html" xml:base="/2017/09/18/kudu-consistency-pt1.html">&lt;p&gt;In this series of short blog posts we will introduce Kudu’s consistency model,
 its design and ultimate goals, current features, and next steps.
 On the way, we’ll shed some light on the more relevant components and how they
 fit together.&lt;/p&gt;

 &lt;p&gt;In Part 1 of the series (this one), we’ll cover motivation and design trade-offs, the end goals and
 the current status.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;what-is-consistency-and-why-is-it-relevant&quot;&gt;What is “consistency” and why is it relevant?&lt;/h2&gt;

 &lt;p&gt;In order to cope with ever increasing data volumes, modern storage systems like Kudu have to support
 many concurrent users while coordinating requests across many machines, each with many threads executing
 work at the same time. However, application developers shouldn’t have to understand the internal
 details of how these systems implement this parallel, distributed, execution in order to write
 correct applications. &lt;em&gt;Consistency in the context of parallel, distributed systems roughly
 refers to how the system behaves in comparison to a single-machine, single-thread system&lt;/em&gt;. In a
 single-threaded, single-machine storage system operations happen one-at-a-time, in a clearly
 defined order, making correct applications easy to code and reason about. A developer writing an
 application against such a system doesn’t have to care about how simultaneous operations interact
 or about ordering anomalies, so the code is simpler, but more importantly, cognitive load is greatly
 reduced, freeing focus for the application logic itself.&lt;/p&gt;

 &lt;p&gt;While such a simple system is definitely possible to build, it wouldn’t be able to cope with very
 large amounts of data. In order to deal with big data volumes and write throughputs modern storage
 systems like Kudu are designed to be distributed, storing and processing data across many machines
 and cores. This means that many things happen simultaneously in the same and different machines,
 that there are more moving parts and thus more oportunity for mis-orderings and for components
 to fail. How far systems like Kudu go (or don’t go) in emulating the simple single-threaded, single-machine
 system a distributed, parallel setting where failures are common is roughly what is referred to
 as how &lt;em&gt;“consistent”&lt;/em&gt; the system is.&lt;/p&gt;

 &lt;p&gt;&lt;em&gt;Consistency&lt;/em&gt; as a term is somewhat overloaded in the distributed systems and database communities,
 there are many different models, properties, different names for the same concept, and often
 different concepts under the same name. This post is not meant to introduce these concepts
 as there are excellent references already available elsewhere (we recommend Kyle Kinsbury’s excellent
 series of blog posts on the matter, like &lt;a href=&quot;https://aphyr.com/posts/313-strong-consistency-models&quot;&gt;this one&lt;/a&gt;).
 Throughout this and follow-up posts we’ll refer to consistency loosely as the &lt;strong&gt;C&lt;/strong&gt; in &lt;strong&gt;CAP&lt;/strong&gt;[1]
 in some cases and as the &lt;strong&gt;I&lt;/strong&gt; in &lt;strong&gt;ACID&lt;/strong&gt;[2] in others; we’ll try to be specific when relevant.&lt;/p&gt;

 &lt;h2 id=&quot;design-decisions-trade-offs-and-motivation&quot;&gt;Design decisions, trade-offs and motivation&lt;/h2&gt;

 &lt;p&gt;Consistency is essentially about ordering and ordering usually has a cost. Distributed storage
 system design must choose to prioritize some properties over others according to the target use
 cases. That is, trade-offs must be made or, borrowing a term from economics, there is
 “no free lunch”. Different systems choose different trade-off points; for instance, systems inspired by &lt;em&gt;Dynamo&lt;/em&gt;[3], usually favor availability in the consistency/availability
 trade-off: by allowing a write to a data item to succeed even when a majority (or even all) of the
 replicas serving that data item are unreachable, Dynamo’s design is minimizing insertion errors and
 insert latency (related to availability) at the cost having to perform extra work for value
 reconciliation on reads and possibly returning stale or disordered values (related to consistency).
 On the other end of the spectrum, traditional DBMS design is often driven by the need to support
 transactions of arbitrary complexity while providing the users stronger, predictable, semantics,
 favoring consistency at the cost of scalability and availability.&lt;/p&gt;

 &lt;p&gt;Kudu’s overarching goal is to enable &lt;em&gt;fast analytic workloads over large amounts of mutable&lt;/em&gt; data,
  meaning it was designed to perform fast scans over large volumes of data stored in many servers.
 In practical terms this means that, when given a choice, more often than not, we opted for the
 design that would enable Kudu to have faster scan performance (i.e. favoring reads even if it meant pushing
 a bit more work to the path that mutates data, i.e. writes). This does not mean that the write path
 was not a concern altogether. In fact, modern storage systems like &lt;em&gt;Google’s Spanner&lt;/em&gt;[4]
 global-scale database demonstrate that, with the right set of trade-offs, it is possible to have strong
 consistency semantics with write latencies and overall availability that are adequate for most use
 cases (e.g. Spanner achieves 5 9’s of availability). For the write path, we often made similar choices in Kudu.&lt;/p&gt;

 &lt;p&gt;Another important aspect that directed our design decisions is the type of &lt;em&gt;write workload&lt;/em&gt; we targeted.
 Traditionally, analytical storage systems target periodic bulk write workloads and a continuous
 stream of analytical scans. This design is often problematic in that it forces users to have to
 build complex pipelines where data is accumulated in one place for later loading into the storage
  system. Moreover, beyond the architectural complexity, this kind of design usually also
 means that the data that is available for analytics is not the most recent. In Kudu we aimed for
 enabling continuous ingest, i.e. having a continuous stream of small writes, obviating the need to
 assemble a pipeline for data accumulation/loading and allowing analytical scans to have access to
 the most recent data. Another important aspect of the write workloads that we targeted in Kudu is
 that they are append-mostly, i.e. most insert new values into the table, with a smaller percentage
 updating currently existing values. Both the average write size and the data distribution influence
 the design of the write path, as we’ll see in the following sections.&lt;/p&gt;

 &lt;p&gt;One last concern we had in mind is that different users have different needs when it comes to
 consistency semantics, particularly as it applies to an analytical storage system like Kudu. For
 some users consistency isn’t a primary concern, they just want fast scans, and the ability to
 update/insert/delete values without needing to build a complex pipeline. For example, many machine
 learning models are mostly insensitive to data recency or ordering so, when using Kudu to store data that
 will be used to train such a model, consistency is often not as primary a concern as read/write performance is.
  In other cases consistency is a much higher priority. For example, when using Kudu to
 store transaction data for fraud analysis it might be important to capture if events are causally
 related. Fraudulent transactions might be characterized by a specific sequence of events and when
 retrieving that data it might be important for the scan result to reflect that sequence. Kudu’s
 design allows users to make a trade-off between consistency and performance at scan time. That is,
 users can choose to have stronger consistency semantics for scans at the penalty of latency and
 throughput or they can choose to weaken the consistency semantics for an extra performance boost.&lt;/p&gt;

 &lt;h3 id=&quot;note&quot;&gt;Note&lt;/h3&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Kudu currently lacks support for atomic multi-row mutation operations (i.e. mutation
 operations to more than one row in the same or different tablets, planned as a future feature).
 So, when discussing writes, we’ll be talking about the consistency semantics of single row mutations.
 In this context we’ll discuss Kudu’s properties more from a key/value store standpoint. On the
 other hand Kudu is an analytical storage engine so, for the read path, we’ll also discuss the
 semantics of large (multi-row) scans. This moves the discussion more into the field of traditional
 DBMSs. These ingredients make for a non-traditional discussion that is not exactly apples-to-apples
 with what the reader might be familiar with, but our hope is that it still provides valuable, or
 at least interesting, insight.&lt;/p&gt;
 &lt;/blockquote&gt;

 &lt;h2 id=&quot;consistency-options-in-kudu&quot;&gt;Consistency options in Kudu&lt;/h2&gt;

 &lt;p&gt;Consistency, as well as other properties, are underpinned in Kudu by the concept of a &lt;em&gt;timestamp&lt;/em&gt;.
 In follow-up posts we’ll look into detail how these are assigned and how they are assembled. For now
 it’s sufficient to know that a timestamp is a single, usually large, number that has some mapping
 to wall time. Each mutation of a Kudu row is tagged with one such timestamp. Globally, these timestamps
 form a partial order over all the rows with the particularity that causally related mutations (e.g.
 a write mutation that is the result of the value obtained from a previous read) may be required to
 have increasing timestamps, depending on the user’s choices.&lt;/p&gt;

 &lt;p&gt;Row mutations performed by a single client &lt;em&gt;instance&lt;/em&gt; are guaranteed to have increasing timestamps
 thus reflecting their potential causal relationship. This property is always enforced. However
 there are two major &lt;em&gt;“knobs”&lt;/em&gt; that are available to the user to make performance trade-offs, the
 &lt;code class=&quot;highlighter-rouge&quot;&gt;Read&lt;/code&gt; mode, and the &lt;code class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; mode (see &lt;a href=&quot;https://kudu.apache.org/docs/transaction_semantics.html&quot;&gt;here&lt;/a&gt;
 for more information on how to use the relevant APIs).&lt;/p&gt;

 &lt;p&gt;The first and most important knob, the &lt;code class=&quot;highlighter-rouge&quot;&gt;Read&lt;/code&gt; mode, pertains to what is the guaranteed recency of
 data resulting from scans. Since Kudu uses replication for availability and fault-tolerance, there
 are always multiple replicas of any data item.
 Not all replicas must be up-to-date so if the user cares about recency, e.g. if the user requires
 that any data read includes all previously written data &lt;em&gt;from a single client instance&lt;/em&gt; then it must
 choose the &lt;code class=&quot;highlighter-rouge&quot;&gt;READ_AT_SNAPSHOT&lt;/code&gt; read mode. With this mode enabled the client is guaranteed to observe
  &lt;strong&gt;“READ YOUR OWN WRITES”&lt;/strong&gt; semantics, i.e. scans from a client will always include all previous mutations
 performed by that client. Note that this property is local to a single client instance, not a global
 property.&lt;/p&gt;

 &lt;p&gt;The second “knob”, the &lt;code class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; mode, defines the semantics of how reads and writes
 are performed across multiple client instances. By default, &lt;code class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; is set to
  &lt;code class=&quot;highlighter-rouge&quot;&gt;CLIENT_PROPAGATED&lt;/code&gt;, meaning it’s up to the user to coordinate a set of &lt;em&gt;timestamp tokens&lt;/em&gt; with clients (even
 across different machines) if they are performing writes/reads that are somehow causally linked.
 If done correctly this enables &lt;strong&gt;STRICT SERIALIZABILITY&lt;/strong&gt;[5], i.e. &lt;strong&gt;LINEARIZABILITY&lt;/strong&gt;[6] and
 &lt;strong&gt;SERIALIZABILITY&lt;/strong&gt;[7] at the same time, at the cost of having the user coordinate the timestamp
 tokens across clients (a survey of the meaning of these, and other definitions can be found
 &lt;a href=&quot;http://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdf&quot;&gt;here&lt;/a&gt;).
 The alternative setting for &lt;code class=&quot;highlighter-rouge&quot;&gt;External Consistency&lt;/code&gt; is to have it set to
 &lt;code class=&quot;highlighter-rouge&quot;&gt;COMMIT_WAIT&lt;/code&gt; (experimental), which guarantees the same properties through a different means, by
 implementing Google Spanner’s &lt;em&gt;TrueTime&lt;/em&gt;. This comes at the cost of higher latency (depending on how
 tightly synchronized the system clocks of the various tablet servers are), but doesn’t require users
 to propagate timestamps programmatically.&lt;/p&gt;

 &lt;h2 id=&quot;next-up&quot;&gt;Next up&lt;/h2&gt;

 &lt;p&gt;In following posts we’ll look into the several components of Kudu’s architecture that come together
 to enable the consistency semantics introduced in the previous section, including:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Transactions and the Transaction Driver&lt;/li&gt;
   &lt;li&gt;Concurrent execution with Multi-Version Concurrency Control&lt;/li&gt;
   &lt;li&gt;Exactly-Once semantics with Replay Cache&lt;/li&gt;
   &lt;li&gt;Replication, Crash Recovery with Consensus and the Write-Ahead-Log&lt;/li&gt;
   &lt;li&gt;Time keeping and timestamp assignment&lt;/li&gt;
 &lt;/ul&gt;

 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

 &lt;p&gt;&lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&amp;amp;rep=rep1&amp;amp;type=pdf&quot;&gt;[1]&lt;/a&gt;: Armando Fox and Eric A. Brewer. 1999. Harvest, Yield, and Scalable Tolerant Systems. In Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems (HOTOS ‘99). IEEE Computer Society, Washington, DC, USA.&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/ACID&quot;&gt;[2]&lt;/a&gt;: ACID - Wikipedia entry&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf&quot;&gt;[3]&lt;/a&gt;: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP ‘07). ACM, New York, NY, USA.&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;https://research.google.com/archive/spanner-osdi2012.pdf&quot;&gt;[4]&lt;/a&gt;: James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google’s globally-distributed database. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA.&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;https://pdfs.semanticscholar.org/fafa/ebf830bc900bccc5e4fd508fd592f5581cbe.pdf&quot;&gt;[5]&lt;/a&gt;: Gifford, David K. Information storage in a decentralized computer system. Diss. Stanford University, 1981.&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;http://www.doc.ic.ac.uk/~gbd10/aw590/Linearizability%20-%20A%20Correctness%20Condition%20for%20Concurrent%20Objects.pdf&quot;&gt;[6]&lt;/a&gt;: Herlihy, Maurice P., and Jeannette M. Wing. “Linearizability: A correctness condition for concurrent objects.” ACM Transactions on Programming Languages and Systems (TOPLAS) 12.3 (1990): 463-492.&lt;/p&gt;

 &lt;p&gt;&lt;a href=&quot;http://www.dtic.mil/get-tr-doc/pdf?AD=ADA078414&quot;&gt;[7]&lt;/a&gt;: Papadimitriou, Christos H. “The serializability of concurrent database updates.” Journal of the ACM (JACM) 26.4 (1979): 631-653.&lt;/p&gt;</content><author><name>David Alves</name></author><summary>In this series of short blog posts we will introduce Kudu’s consistency model,
 its design and ultimate goals, current features, and next steps.
 On the way, we’ll shed some light on the more relevant components and how they
 fit together.

 In Part 1 of the series (this one), we’ll cover motivation and design trade-offs, the end goals and
 the current status.</summary></entry><entry><title>Apache Kudu 1.5.0 released</title><link href="/2017/09/08/apache-kudu-1-5-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.5.0 released" /><published>2017-09-08T00:00:00-07:00</published><updated>2017-09-08T00:00:00-07:00</updated><id>/2017/09/08/apache-kudu-1-5-0-released</id><content type="html" xml:base="/2017/09/08/apache-kudu-1-5-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.5.0!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.5.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.&lt;/p&gt;

 &lt;p&gt;Highlights include:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;optimizations to improve write throughput and failover recovery times&lt;/li&gt;
   &lt;li&gt;the Raft consensus implementation has been made more resilient and flexible
 through “tombstoned voting”, which allows Kudu to self-heal in more edge-case
 scenarios&lt;/li&gt;
   &lt;li&gt;the number of threads used by Kudu servers has been further reduced, with
 additional reductions planned for the future&lt;/li&gt;
   &lt;li&gt;a new configuration dashboard on the web UI which provides a high-level
 summary of important configuration values&lt;/li&gt;
   &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu tablet move&lt;/code&gt; command which moves a tablet replica from one tablet
 server to another&lt;/li&gt;
   &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu local_replica data_size&lt;/code&gt; command which summarizes the space usage
 of a local tablet&lt;/li&gt;
   &lt;li&gt;all on-disk data is now checksummed by default, which provides error detection
 for improved confidence when running Kudu on unreliable hardware&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.5.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.5.0/&quot;&gt;Kudu 1.5.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Dan Burkert</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.5.0!

 Apache Kudu 1.5.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.

 Highlights include:</summary></entry><entry><title>Apache Kudu 1.4.0 released</title><link href="/2017/06/13/apache-kudu-1-4-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.4.0 released" /><published>2017-06-13T00:00:00-07:00</published><updated>2017-06-13T00:00:00-07:00</updated><id>/2017/06/13/apache-kudu-1-4-0-released</id><content type="html" xml:base="/2017/06/13/apache-kudu-1-4-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.4.0!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.4.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.&lt;/p&gt;

 &lt;p&gt;Highlights include:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;ability to alter storage attributes and default values for existing columns&lt;/li&gt;
   &lt;li&gt;a new C++ client API to efficiently map primary keys to their associated partitions
 and hosts&lt;/li&gt;
   &lt;li&gt;support for long-running fault-tolerant scans in the Java client&lt;/li&gt;
   &lt;li&gt;a new &lt;code class=&quot;highlighter-rouge&quot;&gt;kudu fs check&lt;/code&gt; command which can perform offline consistency checks
 and repairs on the local on-disk storage of a Tablet Server or Master.&lt;/li&gt;
   &lt;li&gt;many optimizations to reduce disk space usage, improve write throughput,
 and improve throughput of background maintenance operations.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.4.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.4.0/&quot;&gt;Kudu 1.4.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.4.0!

 Apache Kudu 1.4.0 is a minor release which offers several new features,
 improvements, optimizations, and bug fixes.

 Highlights include:</summary></entry><entry><title>Apache Kudu 1.3.1 released</title><link href="/2017/04/19/apache-kudu-1-3-1-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.1 released" /><published>2017-04-19T00:00:00-07:00</published><updated>2017-04-19T00:00:00-07:00</updated><id>/2017/04/19/apache-kudu-1-3-1-released</id><content type="html" xml:base="/2017/04/19/apache-kudu-1-3-1-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.1!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
 in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
 incorrectly deleted after certain sequences of node failures. Several other
 bugs are also fixed. See the release notes for details.&lt;/p&gt;

 &lt;p&gt;Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.1/&quot;&gt;Kudu 1.3.1 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.1!

 Apache Kudu 1.3.1 is a bug fix release which fixes critical issues discovered
 in Apache Kudu 1.3.0. In particular, this fixes a bug in which data could be
 incorrectly deleted after certain sequences of node failures. Several other
 bugs are also fixed. See the release notes for details.

 Users of Kudu 1.3.0 are encouraged to upgrade to 1.3.1 immediately.


   Download the Kudu 1.3.1 source release
   Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.</summary></entry><entry><title>Apache Kudu 1.3.0 released</title><link href="/2017/03/20/apache-kudu-1-3-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.3.0 released" /><published>2017-03-20T00:00:00-07:00</published><updated>2017-03-20T00:00:00-07:00</updated><id>/2017/03/20/apache-kudu-1-3-0-released</id><content type="html" xml:base="/2017/03/20/apache-kudu-1-3-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.3.0!&lt;/p&gt;

 &lt;p&gt;Apache Kudu 1.3 is a minor release which adds various new features,
 improvements, bug fixes, and optimizations on top of Kudu
 1.2. Highlights include:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;significantly improved support for security, including Kerberos
 authentication, TLS encryption, and coarse-grained (cluster-level)
 authorization&lt;/li&gt;
   &lt;li&gt;automatic garbage collection of historical versions of data&lt;/li&gt;
   &lt;li&gt;lower space consumption and better performance in default
 configurations.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.3.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;p&gt;Thanks to the 25 developers who contributed code or documentation to
 this release!&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.3.0/&quot;&gt;Kudu 1.3.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.3.0!

 Apache Kudu 1.3 is a minor release which adds various new features,
 improvements, bug fixes, and optimizations on top of Kudu
 1.2. Highlights include:</summary></entry><entry><title>Apache Kudu 1.2.0 released</title><link href="/2017/01/20/apache-kudu-1-2-0-released.html" rel="alternate" type="text/html" title="Apache Kudu 1.2.0 released" /><published>2017-01-20T00:00:00-08:00</published><updated>2017-01-20T00:00:00-08:00</updated><id>/2017/01/20/apache-kudu-1-2-0-released</id><content type="html" xml:base="/2017/01/20/apache-kudu-1-2-0-released.html">&lt;p&gt;The Apache Kudu team is happy to announce the release of Kudu 1.2.0!&lt;/p&gt;

 &lt;p&gt;The new release adds several new features and improvements, including:&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;ul&gt;
   &lt;li&gt;User data such as row contents is now redacted from logging statements.&lt;/li&gt;
   &lt;li&gt;Kudu’s ability to provide strong consistency guarantees has been substantially improved.&lt;/li&gt;
   &lt;li&gt;Various performance improvements in metadata management as well as optimizations for BITSHUFFLE encoding on AVX2-capable hosts.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Additionally, 1.2.0 fixes a number of important bugs, including:&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Kudu now automatically limits its usage of file descriptors, preventing crashes due to ulimit exhaustion.&lt;/li&gt;
   &lt;li&gt;Fixed a long-standing issue which could cause ext4 file system corruption on RHEL 6.&lt;/li&gt;
   &lt;li&gt;Fixed a disk space leak.&lt;/li&gt;
   &lt;li&gt;Several fixes for correctness in various edge cases.&lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;The above list of changes is non-exhaustive. Please refer to the
 &lt;a href=&quot;/releases/1.2.0/docs/release_notes.html&quot;&gt;release notes&lt;/a&gt;
 for an expanded list of important improvements, bug fixes, and
 incompatible changes before upgrading.&lt;/p&gt;

 &lt;ul&gt;
   &lt;li&gt;Download the &lt;a href=&quot;/releases/1.2.0/&quot;&gt;Kudu 1.2.0 source release&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;Convenience binary artifacts for the Java client and various Java
 integrations (eg Spark, Flume) are also now available via the ASF Maven
 repository.&lt;/li&gt;
 &lt;/ul&gt;</content><author><name>Todd Lipcon</name></author><summary>The Apache Kudu team is happy to announce the release of Kudu 1.2.0!

 The new release adds several new features and improvements, including:</summary></entry><entry><title>Apache Kudu Weekly Update November 15th, 2016</title><link href="/2016/11/15/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 15th, 2016" /><published>2016-11-15T00:00:00-08:00</published><updated>2016-11-15T00:00:00-08:00</updated><id>/2016/11/15/weekly-update</id><content type="html" xml:base="/2016/11/15/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;project-news&quot;&gt;Project news&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;The first release candidate for Kudu 1.1.0 is &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCADY20s7ZKZkPmUEcTexW%3D%2B_%2BLnDY2hABZg0-UZD3jvWAs9-pog%40mail.gmail.com%3E&quot;&gt;now available&lt;/a&gt;.&lt;/p&gt;

     &lt;dl&gt;
       &lt;dt&gt;Noteworthy new features/improvements:&lt;/dt&gt;
       &lt;dd&gt;
         &lt;ul&gt;
           &lt;li&gt;The Python client has been brought to feature parity with the C++ and Java clients.&lt;/li&gt;
           &lt;li&gt;IN LIST predicates.&lt;/li&gt;
           &lt;li&gt;Java client now features client-side tracing.&lt;/li&gt;
           &lt;li&gt;Kudu now publishes jar files for Spark 2.0 compiled with Scala 2.11.&lt;/li&gt;
           &lt;li&gt;Kudu’s Raft implementation now features pre-elections. In our tests this has greatly improved stability.&lt;/li&gt;
         &lt;/ul&gt;
       &lt;/dd&gt;
     &lt;/dl&gt;

     &lt;p&gt;Community developers and users are encouraged to download the source
 tarball and vote on the release.&lt;/p&gt;

     &lt;p&gt;For more information on what’s new, check out the
 &lt;a href=&quot;https://github.com/apache/kudu/blob/branch-1.1.x/docs/release_notes.adoc&quot;&gt;release notes&lt;/a&gt;.
 &lt;em&gt;Note:&lt;/em&gt; some links from these in-progress release notes will not be live until the
 release itself is published.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;On November 7th, the Kudu PMC announced that Jordan Birdsell, from State Farm, had been voted
 in as a new committer and PMC member.&lt;/p&gt;

     &lt;p&gt;Jordan’s contributions include extensive work on the python client, throwing it some much needed
 love, and bringing it to feature parity with the other clients.&lt;/p&gt;

     &lt;p&gt;Besides his extensive code contributions Jordan has also been active in reviewing other
 developer’s patches and helping the community in general, on slack and other channels.&lt;/p&gt;

     &lt;p&gt;Jordan has been doing great work and the Kudu PMC was pleased to recognize his contributions
 with committership.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Mike Percy will be presenting Kudu Wednesday 16th November at &lt;a href=&quot;https://apachebigdataeu2016.sched.org/&quot;&gt;Apache Big Data Europe, in Seville&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congratulations to Haijie Hong for his &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;first contribution to Kudu!&lt;/a&gt;.
 Haijie fixed some edge cases in BitWriter that were blocking RLE usage for 64 bit types.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congratulations to Maxim Smyatkin for his &lt;a href=&quot;https://gerrit.cloudera.org/#/q/Maxim&quot;&gt;first contributions to Kudu!&lt;/a&gt;.
 Maxim has contributed several patches helping with debug and cleanup.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;A lot of progress has been done towards the goals that were set in the scope docs introduced in
 the last couple of posts. Specifically:&lt;/p&gt;

     &lt;ul&gt;
       &lt;li&gt;
         &lt;p&gt;Dan Burkert, Todd Lipcon and Alexey Serbin have doubled down on the security effort. They have
 been working on enabling Kerberos authentication and rpc encryption. The &lt;a href=&quot;https://docs.google.com/document/d/1cPNDTpVkIUo676RlszpTF1gHZ8l0TdbB7zFBAuOuYUw/edit#heading=h.gsibhnd5dyem&quot;&gt;security scope doc&lt;/a&gt;
 has been updated with the latest plans for security and many patches have been merged already.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;David Alves has continued the work on &lt;a href=&quot;https://s.apache.org/7VCo&quot;&gt;consistency&lt;/a&gt;. Up for review
 and partially pushed is a patch series to address row history loss if a row is deleted and then
 re-inserted. Also in progress is work to make sure that scans at a snapshot from followers
 always return same data as if they were executed on the leader. This helps with Read-Your-Writes
 when reading from lagging replicas.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Adar Dembo has been making good progress &lt;a href=&quot;https://s.apache.org/uOOt&quot;&gt;addressing issues seen with the LogBlockManager&lt;/a&gt;.
 A series of patches have been merged with various fixes to block managers in general and to the
 log block manager in particular.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Dinesh Bhat has been working on improving the manual recovery tools for Kudu. Namely, he has
 added a tool to force a remote replica copy to a destination server, and a tool to delete a
 local replica of a tablet. The latter is useful when a tablet cannot come up due to bad state.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Jean-Daniel Cryans has implemented RPC tracing for the java client, greatly improving
 debuggability. JD also has added ReplicaSelection to the java client, allowing to perform
 scans on replicas other than the leader, which should be of great help for load-balancing.&lt;/p&gt;
       &lt;/li&gt;
       &lt;li&gt;
         &lt;p&gt;Besides the feature parity contributions, Jordan Birdsell has laid out a
 &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/kudu-dev/201611.mbox/%3CCAGaaj_VKfB4mhu6eExHCWo0%3D6Qd0HFWy7bg9e39JgOaFPGJ1nQ%40mail.gmail.com%3E&quot;&gt;roadmap for Python client work&lt;/a&gt;
 for the 1.2 release. Feedback from other Python client users is certainly appreciated.&lt;/p&gt;
       &lt;/li&gt;
     &lt;/ul&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>David Alves</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.</summary></entry><entry><title>Apache Kudu Weekly Update November 1st, 2016</title><link href="/2016/11/01/weekly-update.html" rel="alternate" type="text/html" title="Apache Kudu Weekly Update November 1st, 2016" /><published>2016-11-01T00:00:00-07:00</published><updated>2016-11-01T00:00:00-07:00</updated><id>/2016/11/01/weekly-update</id><content type="html" xml:base="/2016/11/01/weekly-update.html">&lt;p&gt;Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.&lt;/p&gt;

 &lt;!--more--&gt;

 &lt;h2 id=&quot;development-discussions-and-code-in-progress&quot;&gt;Development discussions and code in progress&lt;/h2&gt;

 &lt;ul&gt;
   &lt;li&gt;
     &lt;p&gt;Dan Burkert committed a piece of test infrastructure
 called “MiniKDC” for both Java and C++. The MiniKDC sets up a short-lived
 Kerberos environment in the context of a single test case, making it
 easy to build tests of security features without requiring any special
 infrastructure on the part of the developer.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Todd Lipcon added support for Kerberos (GSSAPI) support to Kudu’s
 RPC system, allowing servers to authenticate the user principal of
 any inbound RPC connection. He also integrated Kudu’s C++ “MiniCluster”
 test infrastructure to allow starting a Kerberized cluster in the
 context of a test.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Dan, Todd, and Alexey Serbin have been iterating on a more detailed
 &lt;a href=&quot;https://docs.google.com/document/d/1Yu4iuIhaERwug1vS95yWDd_WzrNRIKvvVGUb31y-_mY/edit#&quot;&gt;design doc&lt;/a&gt;
 for authentication in Kudu. This doc outlines the various non-Kerberos
 methods that Kudu will use for authentication as well as how TLS will
 be used to encrypt and authenticate some types of connections.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Part of the above design document involves Kudu servers generating and
 signing X509 certificates on the fly to use for authenticated TLS.
 Alexey has been working on a large &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4799/&quot;&gt;patch&lt;/a&gt;
 which uses OpenSSL to provide key generation and signing functionality.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Sailesh Mukil has been working on adding support for
 &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4789/&quot;&gt;TLS in Kudu’s RPC system&lt;/a&gt;. The TLS
 support is a critical part of the overall design for security. This patch
 has gone through several rounds of review and nearing completion.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;JD Cryans has been continuing to improve the Java client, including adding
 the ability to specify that the client would like to read the “closest”
 replica (e.g. reading from a local copy if possible). Additionally,
 JD has been working on some basic &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4781/&quot;&gt;tracing support&lt;/a&gt;
 within the Java client. This tracing aims to make timeouts easier to understand
 and diagnose.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Jordan Birdsell committed 9 more patches to the Python client, bringing it
 very close to feature parity with C++. Jordan has a few more patches in flight
 which should complete this long-running effort.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Congrats to new contributor Haijie Hong who committed his first patch this week.
 Haijie added support for &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4822/&quot;&gt;run-length encoding 64-bit integers&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Will Berkeley picked back up work on &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4310/&quot;&gt;improving the capability of ALTER
 TABLE&lt;/a&gt;. His in-flight patch adds support
 for changing the default value of a column as well as changing storage attributes
 such as desired block size, encoding, and compression.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;Adar Dembo has been working on a series of patches for the Block Manager, the
 component of Kudu which is responsible for laying out blocks on the local
 file system. His patch series consists of a number of refactors to clean up
 and improve the code structure, followed by an &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4848/&quot;&gt;improvement to reduce file system
 fragmentation&lt;/a&gt;.&lt;/p&gt;
   &lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;David Alves has been working on a &lt;a href=&quot;https://gerrit.cloudera.org/#/c/4819/&quot;&gt;patch series&lt;/a&gt;
 which adds support for storing ‘REINSERT’ deltas on disk. These records are
 generated if a user inserts a row, deletes it, and inserts a new row with the
 same primary key. Current versions of Kudu lose track of the history of the
 prior version of the row in this scenario, which prevents correct snapshot reads.
 David’s patch series fixes this.&lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;

 &lt;p&gt;Want to learn more about a specific topic from this blog post? Shoot an email to the
 &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#117;&amp;#100;&amp;#117;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;kudu-user mailing list&lt;/a&gt; or
 tweet at &lt;a href=&quot;https://twitter.com/ApacheKudu&quot;&gt;@ApacheKudu&lt;/a&gt;. Similarly, if you’re
 aware of some Kudu news we missed, let us know so we can cover it in
 a future post.&lt;/p&gt;</content><author><name>Todd Lipcon</name></author><summary>Welcome to the twenty-third edition of the Kudu Weekly Update. This weekly blog post
 covers ongoing development and news in the Apache Kudu project.</summary></entry></feed>