| // Copyright 2015 Cloudera, Inc. |
| // |
| // Licensed under the Apache License, Version 2.0 (the "License"); |
| // you may not use this file except in compliance with the License. |
| // You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| [[release_notes]] |
| = Kudu Release Notes |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| == Introducing Kudu |
| |
| Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares |
| the common technical properties of Hadoop ecosystem applications: it runs on |
| commodity hardware, is horizontally scalable, and supports highly available operation. |
| |
| Kudu’s design sets it apart. Some of Kudu’s benefits include: |
| |
| * Fast processing of OLAP workloads. |
| * Integration with MapReduce, Spark and other Hadoop ecosystem components. |
| * Tight integration with Cloudera Impala, making it a good, mutable alternative to |
| using HDFS with Parquet. See link:kudu_impala_integration.html[Kudu Impala Integration]. |
| * Strong but flexible consistency model. |
| * Strong performance for running sequential and random workloads simultaneously. |
| * Easy to administer and manage with Cloudera Manager. |
| * Efficient utilization of hardware resources. |
| * High availability. Tablet Servers and Masters use the Raft Consensus Algorithm. |
| Given a replication factor of `2f+1`, if `f` tablet servers serving a given tablet |
| fail, the tablet is still available. |
| + |
| NOTE: High availability for masters is not supported during the public beta. |
| |
| By combining all of these properties, Kudu targets support for families of |
| applications that are difficult or impossible to implement on current generation |
| Hadoop storage technologies. |
| |
| === Kudu-Impala Integration Features |
| `CREATE TABLE`:: |
| Impala supports creating and dropping tables using Kudu as the persistence layer. |
| The tables follow the same internal / external approach as other tables in Impala, |
| allowing for flexible data ingestion and querying. |
| `INSERT`:: |
| Data can be inserted into Kudu tables from Impala using the same mechanisms as |
| any other table with HDFS or HBase persistence. |
| `UPDATE` / `DELETE`:: |
| Impala supports the `UPDATE` and `DELETE` SQL commands to modify existing data in |
| a Kudu table row-by-row or as a batch. The syntax of the SQL commands is chosen |
| to be as compatible as possible to existing solutions. In addition to simple `DELETE` |
| or `UPDATE` commands, you can specify complex joins in the `FROM` clause of the query |
| using the same syntax as a regular `SELECT` statement. |
| Flexible Partitioning:: |
| Similar to partitioning of tables in Hive, Kudu allows you to dynamically |
| pre-split tables by hash or range into a predefined number of tablets, in order |
| to distribute writes and queries evenly across your cluster. You can partition by |
| any number of primary key columns, by any number of hashes and an optional list of |
| split rows. |
| Parallel Scan:: |
| To achieve the highest possible performance on modern hardware, the Kudu client |
| within Impala parallelizes scans to multiple tablets. |
| High-efficiency queries:: |
| Where possible, Impala pushes down predicate evaluation to Kudu, so that predicates |
| are evaluated as close as possible to the data. Query performance is comparable |
| to Parquet in many workloads. |
| |
| == About the Kudu Public Beta |
| |
| This release of Kudu is a public beta. Do not run this beta release on production clusters. During the |
| public beta period, Kudu will be supported via a link:https://issues.cloudera.org/projects/KUDU |
| [public JIRA] and a public mailing list link:mailto:kudu-user@cloudera.org[mailing |
| list], which will be monitored by the Kudu development team and community members. |
| Commercial support is not available at this time. |
| |
| * You can submit any issues or feedback related to your Kudu experience via either |
| the JIRA system or the mailing list. The Kudu development team and community members |
| will respond and assist as quickly as possible. |
| * The Kudu team will work with early adopters to fix bugs and release new binary drops |
| when fixes or features are ready. However, we cannot commit to issue resolution or |
| bug fix delivery times during the public beta period, and it is possible that some |
| fixes or enhancements will not be selected for a release. |
| * We can't guarantee timeframes or contents for future beta code drops. However, |
| they will be announced to the user group when they occur. |
| * No guarantees are made regarding upgrades from this release to follow-on releases. |
| While multiple drops of beta code are planned, we can't guarantee their schedules |
| or contents. |
| |
| == Resources |
| |
| - link:http://getkudu.io[Kudu Website] |
| - link:http://github.com/cloudera/kudu[Kudu Github Repository] |
| - link:index.html.html[Kudu Documentation] |
| |
| == Installation Options |
| * A Quickstart VM is provided to get you up and running quickly. |
| * You can install parcels or packages in clusters managed by Cloudera Manager, or |
| packages in standalone CDH clusters. |
| * You can build Kudu from source. |
| |
| For full installation details, see link:installation.html[Kudu Installation]. |
| |
| == Limitations of the Public Beta |
| |
| === Operating System Limitations |
| * RHEL 6.4 or newer, CentOS 6.4 or newer, and Ubuntu Trusty are are the only |
| operating systems supported for installation in the public beta. |
| Others may work but have not been tested. |
| |
| === Storage Limitations |
| * Kudu has been tested with up to 4 TB of data per tablet server. More testing |
| is needed for denser storage configurations. |
| |
| === Schema Limitations |
| * Testing with more than 20 columns has been limited. |
| * Multi-kilobyte rows have not been thoroughly tested. |
| * The columns which make up the primary key must be listed first in the schema. |
| * Key columns cannot be altered. You must drop and recreate a table to change its keys. |
| * Key columns must not be null. |
| * Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a |
| primary key definition. |
| * Type and nullability of existing columns cannot be changed by altering the table. |
| * A table’s primary key cannot be changed. |
| * Dropping a column does not immediately reclaim space. Compaction must run first. |
| There is no way to run compaction manually, but dropping the table will reclaim the |
| space immediately. |
| |
| === Ingest Limitations |
| * Ingest via Sqoop or Flume is not supported in the public beta. The recommended |
| approach for bulk ingest is to use Impala’s `CREATE TABLE AS SELECT` functionality |
| or use the Kudu's Java or C++ API. |
| * Tables must be manually pre-split into tablets using simple or compound primary |
| keys. Automatic splitting is not yet possible. Instead, add split rows at table creation. |
| * Tablets cannot currently be merged. Instead, create a new table with the contents |
| of the old tables to be merged. |
| |
| === Cloudera Manager Limitations |
| * Some metrics, such as latency histograms, are not yet available in Cloudera Manager. |
| * Some service and role chart pages are still under development. More charts and |
| metrics will be visible in future releases. |
| |
| === Replication and Backup Limitations |
| * Replication and failover of Kudu masters is considered experimental. It is |
| recommended to run a single master and periodically perform a manual backup of its data directories. |
| |
| === Impala Limitations |
| * To use Kudu with Impala, you must install a special release of Impala. Obtaining |
| and installing a compatible Impala release is detailed in Kudu's |
| link:kudu_impala_integration.html[Impala Integration] documentation. |
| * To use Impala_Kudu alongside an existing Impala instance, you must install using parcels. |
| * Updates, inserts, and deletes via Impala are non-transactional. If a query |
| fails part-way, its partial effects will not be rolled back.. |
| * All queries will be distributed across all Impala nodes which host a replica |
| of the target table(s), even if a predicate on a primary key could correctly |
| restrict the query to a single tablet. This limits the maximum concurrency of |
| short queries made via Impala. |
| * ALTER TABLE on Kudu tables is not yet supported via Impala. |
| * No timestamp and decimal type support. |
| * The maximum parallelism of a single query is limited to the number of tablets |
| in a table. For good analytic performance, aim for 10 or more tablets per host |
| or large tables. |
| * Impala is only able to push down predicates involving `=`, `<=`, `>=`, |
| or `BETWEEN` comparisons between a column and a literal value. Impala pushes down |
| predicates`<` and `>` for integer columns only. For example, for a table with |
| an integer key `ts`, and a float key `income`, the predicate `WHERE ts >= 12345` |
| will convert into an efficient range scan, whereas ‘where income > 1000000.0’ |
| will currently fetch all data from the table and evaluate the predicate within Impala. |
| |
| === Security Limitations |
| * Authentication and authorization are not included in the public beta. |
| * Data encryption is not included in the public beta. |
| |
| === Client and API Limitations |
| * Potentially-incompatible C++ and Java API changes may be required during the |
| public beta. |
| * ALTER TABLE is not yet fully supported via the client APIs. More ALTER TABLE |
| operations will become available in future betas. |
| * The Python API is not supported. |
| |
| === Application Integration Limitations |
| * The Spark DataFrame implementation is not yet complete. |
| |
| === Other Known Issues |
| The following are known bugs and issues with the current beta release. They will |
| be addressed in later beta releases. |
| |
| * Building Kudu from source using `gcc` 4.6 causes runtime and test failures. Be sure |
| you are using a different version of `gcc` if you build Kudu from source. |
| * If the Kudu master is configured with the `-log_fsync_all` option, tablet servers |
| and clients will experience frequent timeouts, and the cluster may become unusable. |
| * If a tablet server has a very large number of tablets, it may take several minutes |
| to start up. It is recommended to limit the number of tablets per server to 100 or fewer. |
| Consider this limitation when pre-splitting your tables. If you notice slow start-up times, |
| you can monitor the number of tablets per server in the web UI. |
| |
| == Next Steps |
| - link:quickstart.html[Kudu Quickstart] |
| - link:installation.html[Installing Kudu] |
| - link:configuration.html[Configuring Kudu] |
| |