| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[release_notes]] |
| = Apache Kudu 1.0 Release Notes |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| [[rn_1.0.1]] |
| |
| == Overview |
| |
| Apache Kudu 1.0.1 is a bug fix release, with no new features or backwards |
| incompatible changes. |
| |
| [[rn_1.0.1_fixed_issues]] |
| === Fixed Issues |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1681[KUDU-1681] Fixed a bug in |
| the tablet server which could cause a crash when the DNS lookup during master |
| heartbeat failed. |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1660[KUDU-1660]: Fixed a bug |
| which would cause the Kudu master and tablet server to fail to start on single |
| CPU systems. |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1651[KUDU-1652]: Fixed a bug |
| that would cause the C++ client, tablet server, and Java client to crash or |
| throw an exception when attempting to scan a table with a predicate which |
| simplifies to `IS NOT NULL` on a non-nullable column. For instance, setting a |
| `<= 127` predicate on an `INT8` column could trigger this bug, since the |
| predicate only filters null values. |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1651[KUDU-1651]: Fixed a bug |
| that would cause the tablet server to crash when evaluating a scan with |
| predicates over a dictionary encoded column containing an entire block of null |
| values. |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1623[KUDU-1623]: Fixed a bug |
| that would cause the tablet server to crash when handling UPSERT operations |
| that only set values for the primary key columns. |
| |
| - link:http://gerrit.cloudera.org:8080/4488[Gerrit #4488] Fixed a bug in the |
| Java client's KuduException class which could cause an unexpected |
| NullPointerException to be thrown when the exception did not have an |
| associated message. |
| |
| - link:https://issues.apache.org/jira/browse/KUDU-1090[KUDU-1090] Fixed a bug in |
| the memory tracker which could cause a rare crash during tablet server |
| startup. |
| |
| [[rn_1.0.0]] |
| |
| == Overview |
| |
| After approximately a year of beta releases, Apache Kudu has reached version 1.0. |
| This version number signifies that the development team feels that Kudu is stable |
| enough for usage in production environments. |
| |
| If you are new to Kudu, check out its list of link:index.html[features and benefits]. |
| |
| [[rn_1.0.0_new_features]] |
| == New features |
| |
| Kudu 1.0.0 delivers a number of new features, bug fixes, and optimizations. |
| |
| - Removal of multiversion concurrency control (MVCC) history is now supported. |
| This is known as tablet history GC. This allows Kudu to reclaim disk space, |
| where previously Kudu would keep a full history of all changes made to a |
| given table since the beginning of time. Previously, the only way to reclaim |
| disk space was to drop a table. |
| + |
| Kudu will still keep historical data, and the amount of history retained is |
| controlled by setting the configuration flag `--tablet_history_max_age_sec`, |
| which defaults to 15 minutes (expressed in seconds). The timestamp |
| represented by the current time minus `tablet_history_max_age_sec` is known |
| as the ancient history mark (AHM). When a compaction or flush occurs, Kudu |
| will remove the history of changes made prior to the ancient history mark. |
| This only affects historical data; currently-visible data will not be |
| removed. A specialized maintenance manager background task to remove existing |
| "cold" historical data that is not in a row affected by the normal compaction |
| process will be added in a future release. |
| |
| - Most of Kudu's command line tools have been consolidated under a new |
| top-level `kudu` tool. This reduces the number of large binaries distributed |
| with Kudu and also includes much-improved help output. |
| |
| - The Kudu Flume Sink now supports processing events containing Avro-encoded |
| records, using the new `AvroKuduOperationsProducer`. |
| |
| - Administrative tools including `kudu cluster ksck` now support running |
| against multi-master Kudu clusters. |
| |
| - The output of the `ksck` tool is now colorized and much easier to read. |
| |
| - The {cpp} client API now supports writing data in `AUTO_FLUSH_BACKGROUND` mode. |
| This can provide higher throughput for ingest workloads. |
| |
| == Optimizations and improvements |
| |
| - The performance of comparison predicates on dictionary-encoded columns has |
| been substantially optimized. Users are encouraged to use dictionary encoding |
| on any string or binary columns with low cardinality, especially if these |
| columns will be filtered with predicates. |
| |
| - The Java client is now able to prune partitions from scanners based on the |
| provided predicates. For example, an equality predicate on a hash-partitioned |
| column will now only access those tablets that could possibly contain matching |
| data. This is expected to improve performance for the Spark integration as well |
| as applications using the Java client API. |
| |
| - The performance of compaction selection in the tablet server has been |
| substantially improved. This can increase the efficiency of the background |
| maintenance threads and improve overall throughput of heavy write workloads. |
| |
| - The policy by which the tablet server retains write-ahead log (WAL) files has |
| been improved so that it takes into account other replicas of the tablet. |
| This should help mitigate the spurious eviction of tablet replicas on machines |
| that temporarily lag behind the other replicas. |
| |
| == Wire protocol compatibility |
| |
| Kudu 1.0.0 maintains client-server wire-compatibility with previous releases. |
| Applications using the Kudu client libraries may be upgraded either |
| before, at the same time, or after the Kudu servers. |
| |
| Kudu 1.0.0 does _not_ maintain server-server wire compatibility with previous |
| releases. Therefore, rolling upgrades between earlier versions of Kudu and |
| Kudu 1.0.0 are not supported. |
| |
| [[rn_1.0.0_incompatible_changes]] |
| == Incompatible changes in Kudu 1.0.0 |
| |
| === Command line tools |
| |
| - The `kudu-pbc-dump` tool has been removed. The same functionality is now |
| implemented as `kudu pbc dump`. |
| |
| - The `kudu-ksck` tool has been removed. The same functionality is now |
| implemented as `kudu cluster ksck`. |
| |
| - The `cfile-dump` tool has been removed. The same functionality is now |
| implemented as `kudu fs cfile dump`. |
| |
| - The `log-dump` tool has been removed. The same functionality is now |
| implemented as `kudu wal dump` and `kudu local_replica dump wals`. |
| |
| - The `kudu-admin` tool has been removed. The same functionality is now |
| implemented within `kudu table` and `kudu tablet`. |
| |
| - The `kudu-fs_dump` tool has been removed. The same functionality is now |
| implemented as `kudu fs dump`. |
| |
| - The `kudu-ts-cli` tool has been removed. The same functionality is now |
| implemented within `kudu master`, `kudu remote_replica`, and `kudu tserver`. |
| |
| - The `kudu-fs_list` tool has been removed and some similar useful |
| functionality has been moved under 'kudu local_replica'. |
| |
| === Configuration flags |
| |
| - Some configuration flags are now marked as 'unsafe' and 'experimental'. Such flags |
| are disallowed by default. Users may access these flags by enabling the additional |
| flags `--unlock_unsafe_flags` and `--unlock_experimental_flags`. Usage of such flags |
| is not recommended, as the flags may be removed or modified with no deprecation period |
| and without notice in future Kudu releases. |
| |
| === Client APIs ({cpp}/Java/Python) |
| |
| - The `TIMESTAMP` column type has been renamed to `UNIXTIME_MICROS` in order to |
| reduce confusion between Kudu's timestamp support and the timestamps supported |
| by other systems such as Apache Hive and Apache Impala (incubating). Existing |
| tables will automatically be updated to use the new name for the type. |
| + |
| Clients upgrading to the new client libraries must move to the new name for |
| the type. Clients using old client libraries will continue to operate using |
| the old type name, even when connected to clusters that have been |
| upgraded. Similarly, if clients are upgraded before servers, existing |
| timestamp columns will be available using the new type name. |
| |
| |
| - `KuduSession` methods in the {cpp} library are no longer advertised as thread-safe |
| to have one set of semantics for both {cpp} and Java Kudu client libraries. |
| |
| - The `KuduScanToken::TabletServers` method in the {cpp} library has been removed. |
| The same information can now be found in the KuduScanToken::tablet method. |
| |
| === Apache Flume Integration |
| |
| - The `KuduEventProducer` interface used to process Flume events into Kudu operations |
| for the Kudu Flume Sink has changed, and has been renamed `KuduOperationsProducer`. |
| The existing `KuduEventProducer`s have been updated for the new interface, and have |
| been renamed similarly. |
| |
| |
| [[known_issues_and_limitations]] |
| == Known Issues and Limitations |
| |
| === Schema and Usage Limitations |
| * Kudu is primarily designed for analytic use cases. You are likely to encounter issues if |
| a single row contains multiple kilobytes of data. |
| |
| * The columns which make up the primary key must be listed first in the schema. |
| |
| * Key columns cannot be altered. You must drop and recreate a table to change its keys. |
| |
| * Key columns must not be null. |
| |
| * Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a |
| primary key definition. |
| |
| * Type and nullability of existing columns cannot be changed by altering the table. |
| |
| * A table’s primary key cannot be changed. |
| |
| * Dropping a column does not immediately reclaim space. Compaction must run first. |
| There is no way to run compaction manually, but dropping the table will reclaim the |
| space immediately. |
| |
| === Partitioning Limitations |
| * Tables must be manually pre-split into tablets using simple or compound primary |
| keys. Automatic splitting is not yet possible. Range partitions may be added |
| or dropped after a table has been created. See |
| link:schema_design.html[Schema Design] for more information. |
| |
| * Data in existing tables cannot currently be automatically repartitioned. As a workaround, |
| create a new table with the new partitioning and insert the contents of the old |
| table. |
| |
| === Replication and Backup Limitations |
| * Kudu does not currently include any built-in features for backup and restore. |
| Users are encouraged to use tools such as Spark or Impala to export or import |
| tables as necessary. |
| |
| === Impala Limitations |
| |
| * To use Kudu with Impala, you must install a special release of Impala called |
| Impala_Kudu. Obtaining and installing a compatible Impala release is detailed in Kudu's |
| link:kudu_impala_integration.html[Impala Integration] documentation. |
| |
| * To use Impala_Kudu alongside an existing Impala instance, you must install using parcels. |
| |
| * Updates, inserts, and deletes via Impala are non-transactional. If a query |
| fails part of the way through, its partial effects will not be rolled back. |
| |
| * All queries will be distributed across all Impala hosts which host a replica |
| of the target table(s), even if a predicate on a primary key could correctly |
| restrict the query to a single tablet. This limits the maximum concurrency of |
| short queries made via Impala. |
| |
| * No timestamp and decimal type support. |
| |
| * The maximum parallelism of a single query is limited to the number of tablets |
| in a table. For good analytic performance, aim for 10 or more tablets per host |
| or use large tables. |
| |
| * Impala is only able to push down predicates involving `=`, `<=`, `>=`, |
| or `BETWEEN` comparisons between any column and a literal value, and `<` and `>` |
| for integer columns only. For example, for a table with an integer key `ts`, and |
| a string key `name`, the predicate `WHERE ts >= 12345` will convert into an |
| efficient range scan, whereas `where name > 'lipcon'` will currently fetch all |
| data from the table and evaluate the predicate within Impala. |
| |
| === Security Limitations |
| |
| * Authentication and authorization features are not implemented. |
| * Data encryption is not built in. Kudu has been reported to run correctly |
| on systems using local block device encryption (e.g. `dmcrypt`). |
| |
| === Client and API Limitations |
| |
| * `ALTER TABLE` is not yet fully supported via the client APIs. More `ALTER TABLE` |
| operations will become available in future releases. |
| |
| === Other Known Issues |
| |
| The following are known bugs and issues with the current release of Kudu. They will |
| be addressed in later releases. Note that this list is not exhaustive, and is meant |
| to communicate only the most important known issues. |
| |
| * If the Kudu master is configured with the `-log_fsync_all` option, tablet servers |
| and clients will experience frequent timeouts, and the cluster may become unusable. |
| |
| * If a tablet server has a very large number of tablets, it may take several minutes |
| to start up. It is recommended to limit the number of tablets per server to 100 or fewer. |
| Consider this limitation when pre-splitting your tables. If you notice slow start-up times, |
| you can monitor the number of tablets per server in the web UI. |
| |
| * Due to a known bug in Linux kernels prior to 3.8, running Kudu on `ext4` mount points |
| may cause a subsequent `fsck` to fail with errors such as `Logical start <N> does |
| not match logical start <M> at next level`. These errors are repairable using `fsck -y`, |
| but may impact server restart time. |
| + |
| This affects RHEL/CentOS 6.8 and below. A fix is planned for RHEL/CentOS 6.9. |
| RHEL 7.0 and higher are not affected. Ubuntu 14.04 and later are not affected. |
| SLES 12 and later are not affected. |
| |
| == Resources |
| |
| - link:http://kudu.apache.org[Kudu Website] |
| - link:http://github.com/apache/kudu[Kudu GitHub Repository] |
| - link:index.html[Kudu Documentation] |
| - link:prior_release_notes.html[Release notes for older releases] |
| |
| == Installation Options |
| |
| For full installation details, see link:installation.html[Kudu Installation]. |
| |
| == Next Steps |
| - link:quickstart.html[Kudu Quickstart] |
| - link:installation.html[Installing Kudu] |
| - link:configuration.html[Configuring Kudu] |
| |