docs/release_notes.adoc - kudu - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 [[release_notes]]
 = Apache Kudu 1.1 Release Notes

 :author: Kudu Team
 :imagesdir: ./images
 :icons: font
 :toc: left
 :toclevels: 3
 :doctype: book
 :backend: html5
 :sectlinks:
 :experimental:

 [[rn_1.1.0]]

 [[rn_1.1.0_new_features]]
 == New features

 * The Python client has been brought up to feature parity with the Java and {cpp} clients
   and as such the package version will be brought to 1.1 with this release (from 0.3). A
   list of the highlights can be found below.
     ** Improved Partial Row semantics
     ** Range partition support
     ** Scan Token API
     ** Enhanced predicate support
     ** Support for all Kudu data types (including a mapping of Python's `datetime.datetime` to
     `UNIXTIME_MICROS`)
     ** Alter table support
     ** Enabled Read at Snapshot for Scanners
     ** Enabled Scanner Replica Selection
     ** A few bug fixes for Python 3 in addition to various other improvements.

 * IN LIST predicate pushdown support was added to allow optimized execution of filters which
   match on a set of column values. Support for Spark, Map Reduce and Impala queries utilizing
   IN LIST pushdown is not yet complete.

 * The Java client now features client-side request tracing in order to help troubleshoot timeouts.
   Error messages are now augmented with traces that show which servers were contacted before the
   timeout occured instead of just the last error. The traces also contain RPCs that were
   required to fulfill the client's request, such as contacting the master to discover a tablet's
   location. Note that the traces are not available for successful requests and are not
   programatically queryable.

 == Optimizations and improvements

 * Kudu now publishes JAR files for Spark 2.0 compiled with Scala 2.11 along with the
   existing Spark 1.6 JAR compiled with Scala 2.10.

 * The Java client now allows configuring scanners to read from the closest replica instead of
   the known leader replica. The default remains the latter. Use the relevant `ReplicaSelection`
   enum with the scanner's builder to change this behavior.

 * Tablet servers use a new policy for retaining write-ahead log (WAL) segments.
   Previously, servers used the 'log_min_segments_to_retain' flag to prioritize
   any flushes which were retaining log segments past the configured value (default 2).
   This policy caused servers to flush in-memory data more frequently than necessary,
   limiting write performance.
 +
 The new policy introduces a new flag 'log_target_replay_size_mb' which
   determines the threshold at which write-ahead log retention will prioritize flushes.
   The new flag is considered experimental and users should not need to modify
   its value.
 +
 The improved policy has been seen to improve write performance in some use cases
   by a factor of 2x relative to the old policy.

 * Kudu's implementation of the Raft consensus algorithm has been improved to include
   a "pre-election" phase. This can improve the stability of tablet leader election
   in high-load scenarios, especially if each server hosts a high number of tablets.

 * Tablet server start-up time has been substantially improved in the case that
   the server contains a high number of tombstoned tablet replicas.

 === Command line tools

 * The tool `kudu tablet leader_step_down` has been added to manually force a leader to step down.
 * The tool `kudu remote_replica copy` has been added to manually copy a replica from
   one running tablet server to another.
 * The tool `kudu local_replica delete` has been added to delete a replica of a tablet.
 * The `kudu test loadgen` tool has been added to replace the obsoleted
   `insert-generated-rows` standalone binary. The new tool is enriched with
   additional functionality and can be used to run load generation tests against
   a Kudu cluster.

 == Wire protocol compatibility

 Kudu 1.1.0 is wire-compatible with previous versions of Kudu:

 * Kudu 1.1 clients may connect to servers running Kudu 1.0. If the client uses the new
   'IN LIST' predicate type, an error will be returned.
 * Kudu 1.0 clients may connect to servers running Kudu 1.1 without limitations.
 * Rolling upgrade between Kudu 1.0 and Kudu 1.1 servers is believed to be possible
   though has not been sufficiently tested. Users are encouraged to shut down all nodes
   in the cluster, upgrade the software, and then restart the daemons on the new version.

 [[rn_1.1.0_incompatible_changes]]
 == Incompatible changes in Kudu 1.1.0

 === Client APIs ({cpp}/Java/Python)

 * The {cpp} client no longer requires the
   link:https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html[old gcc5 ABI].
   Which ABI is actually used depends on the compiler configuration. Some new distros
   (e.g. Ubuntu 16.04) will use the new ABI. Your application must use the same ABI as is
   used by the client library; an easy way to guarantee this is to use the same compiler
   to build both.

 * The {cpp} client's `KuduSession::CountBufferedOperations()` method is
   deprecated. Its behavior is inconsistent unless the session runs in the
   `MANUAL_FLUSH` mode. Instead, to get number of buffered operations, count
   invocations of the `KuduSession::Apply()` method since last
   `KuduSession::Flush()` call or, if using asynchronous flushing, since last
   invocation of the callback passed into `KuduSession::FlushAsync()`.

 * The Java client's `OperationResponse.getWriteTimestamp` method was renamed to `getWriteTimestampRaw`
   to emphasize that it doesn't return milliseconds, unlike what its Javadoc indicated. The renamed
   method was also hidden from the public APIs and should not be used.

 * The Java client's sync API (`KuduClient`, `KuduSession`, `KuduScanner`) used to throw either
   a `NonRecoverableException` or a `TimeoutException` for a timeout, and now it's only possible for the
   client to throw the former.

 * The Java client's handling of errors in `KuduSession` was modified so that subclasses of
   `KuduException` are converted into RowErrors instead of being thrown.

 [[known_issues_and_limitations]]
 == Known Issues and Limitations

 === Schema and Usage Limitations
 * Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
   a single row contains multiple kilobytes of data.

 * The columns which make up the primary key must be listed first in the schema.

 * Key columns cannot be altered. You must drop and recreate a table to change its keys.

 * Key columns must not be null.

 * Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a
   primary key definition.

 * Type and nullability of existing columns cannot be changed by altering the table.

 * A table’s primary key cannot be changed.

 * Dropping a column does not immediately reclaim space. Compaction must run first.
 There is no way to run compaction manually, but dropping the table will reclaim the
 space immediately.

 === Partitioning Limitations
 * Tables must be manually pre-split into tablets using simple or compound primary
   keys. Automatic splitting is not yet possible. Range partitions may be added
   or dropped after a table has been created. See
   link:schema_design.html[Schema Design] for more information.

 * Data in existing tables cannot currently be automatically repartitioned. As a workaround,
   create a new table with the new partitioning and insert the contents of the old
   table.

 === Replication and Backup Limitations
 * Kudu does not currently include any built-in features for backup and restore.
   Users are encouraged to use tools such as Spark or Impala to export or import
   tables as necessary.

 === Impala Limitations

 * To use Kudu with Impala, you must install a special release of Impala called
   Impala_Kudu. Obtaining and installing a compatible Impala release is detailed in Kudu's
   link:kudu_impala_integration.html[Impala Integration] documentation.

 * To use Impala_Kudu alongside an existing Impala instance, you must install using parcels.

 * Updates, inserts, and deletes via Impala are non-transactional. If a query
   fails part of the way through, its partial effects will not be rolled back.

 * No timestamp and decimal type support.

 * The maximum parallelism of a single query is limited to the number of tablets
   in a table. For good analytic performance, aim for 10 or more tablets per host
   or use large tables.

 === Security Limitations

 * Authentication and authorization features are not implemented.
 * Data encryption is not built in. Kudu has been reported to run correctly
   on systems using local block device encryption (e.g. `dmcrypt`).

 === Client and API Limitations

 * `ALTER TABLE` is not yet fully supported via the client APIs. More `ALTER TABLE`
   operations will become available in future releases.

 === Other Known Issues

 The following are known bugs and issues with the current release of Kudu. They will
 be addressed in later releases. Note that this list is not exhaustive, and is meant
 to communicate only the most important known issues.

 * If the Kudu master is configured with the `-log_fsync_all` option, tablet servers
   and clients will experience frequent timeouts, and the cluster may become unusable.

 * If a tablet server has a very large number of tablets, it may take several minutes
   to start up. It is recommended to limit the number of tablets per server to 100 or fewer.
   Consider this limitation when pre-splitting your tables. If you notice slow start-up times,
   you can monitor the number of tablets per server in the web UI.

 * Due to a known bug in Linux kernels prior to 3.8, running Kudu on `ext4` mount points
   may cause a subsequent `fsck` to fail with errors such as `Logical start <N> does
   not match logical start <M> at next level`. These errors are repairable using `fsck -y`,
   but may impact server restart time.
 +
 This affects RHEL/CentOS 6.8 and below. A fix is planned for RHEL/CentOS 6.9.
   RHEL 7.0 and higher are not affected. Ubuntu 14.04 and later are not affected.
   SLES 12 and later are not affected.

 == Resources

 - link:http://kudu.apache.org[Kudu Website]
 - link:http://github.com/apache/kudu[Kudu GitHub Repository]
 - link:index.html[Kudu Documentation]
 - link:prior_release_notes.html[Release notes for older releases]

 == Installation Options

 For full installation details, see link:installation.html[Kudu Installation].

 == Next Steps
 - link:quickstart.html[Kudu Quickstart]
 - link:installation.html[Installing Kudu]
 - link:configuration.html[Configuring Kudu]
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	[[release_notes]]
	= Apache Kudu 1.1 Release Notes

	:author: Kudu Team
	:imagesdir: ./images
	:icons: font
	:toc: left
	:toclevels: 3
	:doctype: book
	:backend: html5
	:sectlinks:
	:experimental:

	[[rn_1.1.0]]

	[[rn_1.1.0_new_features]]
	== New features

	* The Python client has been brought up to feature parity with the Java and {cpp} clients
	and as such the package version will be brought to 1.1 with this release (from 0.3). A
	list of the highlights can be found below.
	** Improved Partial Row semantics
	** Range partition support
	** Scan Token API
	** Enhanced predicate support
	** Support for all Kudu data types (including a mapping of Python's `datetime.datetime` to
	`UNIXTIME_MICROS`)
	** Alter table support
	** Enabled Read at Snapshot for Scanners
	** Enabled Scanner Replica Selection
	** A few bug fixes for Python 3 in addition to various other improvements.

	* IN LIST predicate pushdown support was added to allow optimized execution of filters which
	match on a set of column values. Support for Spark, Map Reduce and Impala queries utilizing
	IN LIST pushdown is not yet complete.

	* The Java client now features client-side request tracing in order to help troubleshoot timeouts.
	Error messages are now augmented with traces that show which servers were contacted before the
	timeout occured instead of just the last error. The traces also contain RPCs that were
	required to fulfill the client's request, such as contacting the master to discover a tablet's
	location. Note that the traces are not available for successful requests and are not
	programatically queryable.

	== Optimizations and improvements

	* Kudu now publishes JAR files for Spark 2.0 compiled with Scala 2.11 along with the
	existing Spark 1.6 JAR compiled with Scala 2.10.

	* The Java client now allows configuring scanners to read from the closest replica instead of
	the known leader replica. The default remains the latter. Use the relevant `ReplicaSelection`
	enum with the scanner's builder to change this behavior.

	* Tablet servers use a new policy for retaining write-ahead log (WAL) segments.
	Previously, servers used the 'log_min_segments_to_retain' flag to prioritize
	any flushes which were retaining log segments past the configured value (default 2).
	This policy caused servers to flush in-memory data more frequently than necessary,
	limiting write performance.
	+
	The new policy introduces a new flag 'log_target_replay_size_mb' which
	determines the threshold at which write-ahead log retention will prioritize flushes.
	The new flag is considered experimental and users should not need to modify
	its value.
	+
	The improved policy has been seen to improve write performance in some use cases
	by a factor of 2x relative to the old policy.

	* Kudu's implementation of the Raft consensus algorithm has been improved to include
	a "pre-election" phase. This can improve the stability of tablet leader election
	in high-load scenarios, especially if each server hosts a high number of tablets.

	* Tablet server start-up time has been substantially improved in the case that
	the server contains a high number of tombstoned tablet replicas.

	=== Command line tools

	* The tool `kudu tablet leader_step_down` has been added to manually force a leader to step down.
	* The tool `kudu remote_replica copy` has been added to manually copy a replica from
	one running tablet server to another.
	* The tool `kudu local_replica delete` has been added to delete a replica of a tablet.
	* The `kudu test loadgen` tool has been added to replace the obsoleted
	`insert-generated-rows` standalone binary. The new tool is enriched with
	additional functionality and can be used to run load generation tests against
	a Kudu cluster.

	== Wire protocol compatibility

	Kudu 1.1.0 is wire-compatible with previous versions of Kudu:

	* Kudu 1.1 clients may connect to servers running Kudu 1.0. If the client uses the new
	'IN LIST' predicate type, an error will be returned.
	* Kudu 1.0 clients may connect to servers running Kudu 1.1 without limitations.
	* Rolling upgrade between Kudu 1.0 and Kudu 1.1 servers is believed to be possible
	though has not been sufficiently tested. Users are encouraged to shut down all nodes
	in the cluster, upgrade the software, and then restart the daemons on the new version.

	[[rn_1.1.0_incompatible_changes]]
	== Incompatible changes in Kudu 1.1.0

	=== Client APIs ({cpp}/Java/Python)

	* The {cpp} client no longer requires the
	link:https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html[old gcc5 ABI].
	Which ABI is actually used depends on the compiler configuration. Some new distros
	(e.g. Ubuntu 16.04) will use the new ABI. Your application must use the same ABI as is
	used by the client library; an easy way to guarantee this is to use the same compiler
	to build both.

	* The {cpp} client's `KuduSession::CountBufferedOperations()` method is
	deprecated. Its behavior is inconsistent unless the session runs in the
	`MANUAL_FLUSH` mode. Instead, to get number of buffered operations, count
	invocations of the `KuduSession::Apply()` method since last
	`KuduSession::Flush()` call or, if using asynchronous flushing, since last
	invocation of the callback passed into `KuduSession::FlushAsync()`.

	* The Java client's `OperationResponse.getWriteTimestamp` method was renamed to `getWriteTimestampRaw`
	to emphasize that it doesn't return milliseconds, unlike what its Javadoc indicated. The renamed
	method was also hidden from the public APIs and should not be used.

	* The Java client's sync API (`KuduClient`, `KuduSession`, `KuduScanner`) used to throw either
	a `NonRecoverableException` or a `TimeoutException` for a timeout, and now it's only possible for the
	client to throw the former.

	* The Java client's handling of errors in `KuduSession` was modified so that subclasses of
	`KuduException` are converted into RowErrors instead of being thrown.

	[[known_issues_and_limitations]]
	== Known Issues and Limitations

	=== Schema and Usage Limitations
	* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
	a single row contains multiple kilobytes of data.

	* The columns which make up the primary key must be listed first in the schema.

	* Key columns cannot be altered. You must drop and recreate a table to change its keys.

	* Key columns must not be null.

	* Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a
	primary key definition.

	* Type and nullability of existing columns cannot be changed by altering the table.

	* A table’s primary key cannot be changed.

	* Dropping a column does not immediately reclaim space. Compaction must run first.
	There is no way to run compaction manually, but dropping the table will reclaim the
	space immediately.

	=== Partitioning Limitations
	* Tables must be manually pre-split into tablets using simple or compound primary
	keys. Automatic splitting is not yet possible. Range partitions may be added
	or dropped after a table has been created. See
	link:schema_design.html[Schema Design] for more information.

	* Data in existing tables cannot currently be automatically repartitioned. As a workaround,
	create a new table with the new partitioning and insert the contents of the old
	table.

	=== Replication and Backup Limitations
	* Kudu does not currently include any built-in features for backup and restore.
	Users are encouraged to use tools such as Spark or Impala to export or import
	tables as necessary.

	=== Impala Limitations

	* To use Kudu with Impala, you must install a special release of Impala called
	Impala_Kudu. Obtaining and installing a compatible Impala release is detailed in Kudu's
	link:kudu_impala_integration.html[Impala Integration] documentation.

	* To use Impala_Kudu alongside an existing Impala instance, you must install using parcels.

	* Updates, inserts, and deletes via Impala are non-transactional. If a query
	fails part of the way through, its partial effects will not be rolled back.

	* No timestamp and decimal type support.

	* The maximum parallelism of a single query is limited to the number of tablets
	in a table. For good analytic performance, aim for 10 or more tablets per host
	or use large tables.

	=== Security Limitations

	* Authentication and authorization features are not implemented.
	* Data encryption is not built in. Kudu has been reported to run correctly
	on systems using local block device encryption (e.g. `dmcrypt`).

	=== Client and API Limitations

	* `ALTER TABLE` is not yet fully supported via the client APIs. More `ALTER TABLE`
	operations will become available in future releases.

	=== Other Known Issues

	The following are known bugs and issues with the current release of Kudu. They will
	be addressed in later releases. Note that this list is not exhaustive, and is meant
	to communicate only the most important known issues.

	* If the Kudu master is configured with the `-log_fsync_all` option, tablet servers
	and clients will experience frequent timeouts, and the cluster may become unusable.

	* If a tablet server has a very large number of tablets, it may take several minutes
	to start up. It is recommended to limit the number of tablets per server to 100 or fewer.
	Consider this limitation when pre-splitting your tables. If you notice slow start-up times,
	you can monitor the number of tablets per server in the web UI.

	* Due to a known bug in Linux kernels prior to 3.8, running Kudu on `ext4` mount points
	may cause a subsequent `fsck` to fail with errors such as `Logical start <N> does
	not match logical start <M> at next level`. These errors are repairable using `fsck -y`,
	but may impact server restart time.
	+
	This affects RHEL/CentOS 6.8 and below. A fix is planned for RHEL/CentOS 6.9.
	RHEL 7.0 and higher are not affected. Ubuntu 14.04 and later are not affected.
	SLES 12 and later are not affected.

	== Resources

	- link:http://kudu.apache.org[Kudu Website]
	- link:http://github.com/apache/kudu[Kudu GitHub Repository]
	- link:index.html[Kudu Documentation]
	- link:prior_release_notes.html[Release notes for older releases]

	== Installation Options

	For full installation details, see link:installation.html[Kudu Installation].

	== Next Steps
	- link:quickstart.html[Kudu Quickstart]
	- link:installation.html[Installing Kudu]
	- link:configuration.html[Configuring Kudu]