blob: 1eb647d9f6618bab23849a7d782b583dc078a1b1 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[release_notes]]
= Apache Kudu 1.0 Release Notes
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
[[rn_1.0.1]]
== Overview
Apache Kudu 1.0.1 is a bug fix release, with no new features or backwards
incompatible changes.
[[rn_1.0.1_fixed_issues]]
=== Fixed Issues
- link:https://issues.apache.org/jira/browse/KUDU-1681[KUDU-1681] Fixed a bug in
the tablet server which could cause a crash when the DNS lookup during master
heartbeat failed.
- link:https://issues.apache.org/jira/browse/KUDU-1660[KUDU-1660]: Fixed a bug
which would cause the Kudu master and tablet server to fail to start on single
CPU systems.
- link:https://issues.apache.org/jira/browse/KUDU-1651[KUDU-1652]: Fixed a bug
that would cause the C++ client, tablet server, and Java client to crash or
throw an exception when attempting to scan a table with a predicate which
simplifies to `IS NOT NULL` on a non-nullable column. For instance, setting a
`<= 127` predicate on an `INT8` column could trigger this bug, since the
predicate only filters null values.
- link:https://issues.apache.org/jira/browse/KUDU-1651[KUDU-1651]: Fixed a bug
that would cause the tablet server to crash when evaluating a scan with
predicates over a dictionary encoded column containing an entire block of null
values.
- link:https://issues.apache.org/jira/browse/KUDU-1623[KUDU-1623]: Fixed a bug
that would cause the tablet server to crash when handling UPSERT operations
that only set values for the primary key columns.
- link:http://gerrit.cloudera.org:8080/4488[Gerrit #4488] Fixed a bug in the
Java client's KuduException class which could cause an unexpected
NullPointerException to be thrown when the exception did not have an
associated message.
- link:https://issues.apache.org/jira/browse/KUDU-1090[KUDU-1090] Fixed a bug in
the memory tracker which could cause a rare crash during tablet server
startup.
[[rn_1.0.0]]
== Overview
After approximately a year of beta releases, Apache Kudu has reached version 1.0.
This version number signifies that the development team feels that Kudu is stable
enough for usage in production environments.
If you are new to Kudu, check out its list of link:index.html[features and benefits].
[[rn_1.0.0_new_features]]
== New features
Kudu 1.0.0 delivers a number of new features, bug fixes, and optimizations.
- Removal of multiversion concurrency control (MVCC) history is now supported.
This is known as tablet history GC. This allows Kudu to reclaim disk space,
where previously Kudu would keep a full history of all changes made to a
given table since the beginning of time. Previously, the only way to reclaim
disk space was to drop a table.
+
Kudu will still keep historical data, and the amount of history retained is
controlled by setting the configuration flag `--tablet_history_max_age_sec`,
which defaults to 15 minutes (expressed in seconds). The timestamp
represented by the current time minus `tablet_history_max_age_sec` is known
as the ancient history mark (AHM). When a compaction or flush occurs, Kudu
will remove the history of changes made prior to the ancient history mark.
This only affects historical data; currently-visible data will not be
removed. A specialized maintenance manager background task to remove existing
"cold" historical data that is not in a row affected by the normal compaction
process will be added in a future release.
- Most of Kudu's command line tools have been consolidated under a new
top-level `kudu` tool. This reduces the number of large binaries distributed
with Kudu and also includes much-improved help output.
- The Kudu Flume Sink now supports processing events containing Avro-encoded
records, using the new `AvroKuduOperationsProducer`.
- Administrative tools including `kudu cluster ksck` now support running
against multi-master Kudu clusters.
- The output of the `ksck` tool is now colorized and much easier to read.
- The {cpp} client API now supports writing data in `AUTO_FLUSH_BACKGROUND` mode.
This can provide higher throughput for ingest workloads.
== Optimizations and improvements
- The performance of comparison predicates on dictionary-encoded columns has
been substantially optimized. Users are encouraged to use dictionary encoding
on any string or binary columns with low cardinality, especially if these
columns will be filtered with predicates.
- The Java client is now able to prune partitions from scanners based on the
provided predicates. For example, an equality predicate on a hash-partitioned
column will now only access those tablets that could possibly contain matching
data. This is expected to improve performance for the Spark integration as well
as applications using the Java client API.
- The performance of compaction selection in the tablet server has been
substantially improved. This can increase the efficiency of the background
maintenance threads and improve overall throughput of heavy write workloads.
- The policy by which the tablet server retains write-ahead log (WAL) files has
been improved so that it takes into account other replicas of the tablet.
This should help mitigate the spurious eviction of tablet replicas on machines
that temporarily lag behind the other replicas.
== Wire protocol compatibility
Kudu 1.0.0 maintains client-server wire-compatibility with previous releases.
Applications using the Kudu client libraries may be upgraded either
before, at the same time, or after the Kudu servers.
Kudu 1.0.0 does _not_ maintain server-server wire compatibility with previous
releases. Therefore, rolling upgrades between earlier versions of Kudu and
Kudu 1.0.0 are not supported.
[[rn_1.0.0_incompatible_changes]]
== Incompatible changes in Kudu 1.0.0
=== Command line tools
- The `kudu-pbc-dump` tool has been removed. The same functionality is now
implemented as `kudu pbc dump`.
- The `kudu-ksck` tool has been removed. The same functionality is now
implemented as `kudu cluster ksck`.
- The `cfile-dump` tool has been removed. The same functionality is now
implemented as `kudu fs cfile dump`.
- The `log-dump` tool has been removed. The same functionality is now
implemented as `kudu wal dump` and `kudu local_replica dump wals`.
- The `kudu-admin` tool has been removed. The same functionality is now
implemented within `kudu table` and `kudu tablet`.
- The `kudu-fs_dump` tool has been removed. The same functionality is now
implemented as `kudu fs dump`.
- The `kudu-ts-cli` tool has been removed. The same functionality is now
implemented within `kudu master`, `kudu remote_replica`, and `kudu tserver`.
- The `kudu-fs_list` tool has been removed and some similar useful
functionality has been moved under 'kudu local_replica'.
=== Configuration flags
- Some configuration flags are now marked as 'unsafe' and 'experimental'. Such flags
are disallowed by default. Users may access these flags by enabling the additional
flags `--unlock_unsafe_flags` and `--unlock_experimental_flags`. Usage of such flags
is not recommended, as the flags may be removed or modified with no deprecation period
and without notice in future Kudu releases.
=== Client APIs ({cpp}/Java/Python)
- The `TIMESTAMP` column type has been renamed to `UNIXTIME_MICROS` in order to
reduce confusion between Kudu's timestamp support and the timestamps supported
by other systems such as Apache Hive and Apache Impala (incubating). Existing
tables will automatically be updated to use the new name for the type.
+
Clients upgrading to the new client libraries must move to the new name for
the type. Clients using old client libraries will continue to operate using
the old type name, even when connected to clusters that have been
upgraded. Similarly, if clients are upgraded before servers, existing
timestamp columns will be available using the new type name.
- `KuduSession` methods in the {cpp} library are no longer advertised as thread-safe
to have one set of semantics for both {cpp} and Java Kudu client libraries.
- The `KuduScanToken::TabletServers` method in the {cpp} library has been removed.
The same information can now be found in the KuduScanToken::tablet method.
=== Apache Flume Integration
- The `KuduEventProducer` interface used to process Flume events into Kudu operations
for the Kudu Flume Sink has changed, and has been renamed `KuduOperationsProducer`.
The existing `KuduEventProducer`s have been updated for the new interface, and have
been renamed similarly.
[[known_issues_and_limitations]]
== Known Issues and Limitations
=== Schema and Usage Limitations
* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
a single row contains multiple kilobytes of data.
* The columns which make up the primary key must be listed first in the schema.
* Key columns cannot be altered. You must drop and recreate a table to change its keys.
* Key columns must not be null.
* Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a
primary key definition.
* Type and nullability of existing columns cannot be changed by altering the table.
* A table’s primary key cannot be changed.
* Dropping a column does not immediately reclaim space. Compaction must run first.
There is no way to run compaction manually, but dropping the table will reclaim the
space immediately.
=== Partitioning Limitations
* Tables must be manually pre-split into tablets using simple or compound primary
keys. Automatic splitting is not yet possible. Range partitions may be added
or dropped after a table has been created. See
link:schema_design.html[Schema Design] for more information.
* Data in existing tables cannot currently be automatically repartitioned. As a workaround,
create a new table with the new partitioning and insert the contents of the old
table.
=== Replication and Backup Limitations
* Kudu does not currently include any built-in features for backup and restore.
Users are encouraged to use tools such as Spark or Impala to export or import
tables as necessary.
=== Impala Limitations
* To use Kudu with Impala, you must install a special release of Impala called
Impala_Kudu. Obtaining and installing a compatible Impala release is detailed in Kudu's
link:kudu_impala_integration.html[Impala Integration] documentation.
* To use Impala_Kudu alongside an existing Impala instance, you must install using parcels.
* Updates, inserts, and deletes via Impala are non-transactional. If a query
fails part of the way through, its partial effects will not be rolled back.
* All queries will be distributed across all Impala hosts which host a replica
of the target table(s), even if a predicate on a primary key could correctly
restrict the query to a single tablet. This limits the maximum concurrency of
short queries made via Impala.
* No timestamp and decimal type support.
* The maximum parallelism of a single query is limited to the number of tablets
in a table. For good analytic performance, aim for 10 or more tablets per host
or use large tables.
* Impala is only able to push down predicates involving `=`, `<=`, `>=`,
or `BETWEEN` comparisons between any column and a literal value, and `<` and `>`
for integer columns only. For example, for a table with an integer key `ts`, and
a string key `name`, the predicate `WHERE ts >= 12345` will convert into an
efficient range scan, whereas `where name > 'lipcon'` will currently fetch all
data from the table and evaluate the predicate within Impala.
=== Security Limitations
* Authentication and authorization features are not implemented.
* Data encryption is not built in. Kudu has been reported to run correctly
on systems using local block device encryption (e.g. `dmcrypt`).
=== Client and API Limitations
* `ALTER TABLE` is not yet fully supported via the client APIs. More `ALTER TABLE`
operations will become available in future releases.
=== Other Known Issues
The following are known bugs and issues with the current release of Kudu. They will
be addressed in later releases. Note that this list is not exhaustive, and is meant
to communicate only the most important known issues.
* If the Kudu master is configured with the `-log_fsync_all` option, tablet servers
and clients will experience frequent timeouts, and the cluster may become unusable.
* If a tablet server has a very large number of tablets, it may take several minutes
to start up. It is recommended to limit the number of tablets per server to 100 or fewer.
Consider this limitation when pre-splitting your tables. If you notice slow start-up times,
you can monitor the number of tablets per server in the web UI.
* Due to a known bug in Linux kernels prior to 3.8, running Kudu on `ext4` mount points
may cause a subsequent `fsck` to fail with errors such as `Logical start <N> does
not match logical start <M> at next level`. These errors are repairable using `fsck -y`,
but may impact server restart time.
+
This affects RHEL/CentOS 6.8 and below. A fix is planned for RHEL/CentOS 6.9.
RHEL 7.0 and higher are not affected. Ubuntu 14.04 and later are not affected.
SLES 12 and later are not affected.
== Resources
- link:http://kudu.apache.org[Kudu Website]
- link:http://github.com/apache/kudu[Kudu GitHub Repository]
- link:index.html[Kudu Documentation]
- link:prior_release_notes.html[Release notes for older releases]
== Installation Options
For full installation details, see link:installation.html[Kudu Installation].
== Next Steps
- link:quickstart.html[Kudu Quickstart]
- link:installation.html[Installing Kudu]
- link:configuration.html[Configuring Kudu]