website/0.8.0/src/site/apt/releasenotes/release-0.8.0.apt - helix - Git at Google

  -----
  Release Notes for Apache Helix 0.8.0
  -----

 ~~ Licensed to the Apache Software Foundation (ASF) under one
 ~~ or more contributor license agreements.  See the NOTICE file
 ~~ distributed with this work for additional information
 ~~ regarding copyright ownership.  The ASF licenses this file
 ~~ to you under the Apache License, Version 2.0 (the
 ~~ "License"); you may not use this file except in compliance
 ~~ with the License.  You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing,
 ~~ software distributed under the License is distributed on an
 ~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 ~~ KIND, either express or implied.  See the License for the
 ~~ specific language governing permissions and limitations
 ~~ under the License.

 ~~ NOTE: For help with the syntax of this file, see:
 ~~ http://maven.apache.org/guides/mini/guide-apt-format.html

 Release Notes for Apache Helix 0.8.0

   The Apache Helix team would like to announce the release of Apache Helix 0.8.0.

   This is the twelfth release under the Apache umbrella, and the eighth as a top-level project.

   Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

   * Automatic assignment of resource/partition to nodes

   * Node failure detection and recovery

   * Dynamic addition of Resources

   * Dynamic addition of nodes to the cluster

   * Pluggable distributed state machine to manage the state of a resource via state transitions

   * Automatic load balancing and throttling of transitions

 []

 * Detailed Changes

 ** New Features

     * Helix REST 2.0

         * All admin operations are exposed via restful API

         * Support all task framework API

         * Helix Rest 2.0 uses standard HTTP methods if possible, instead of customized command as in HelixAdminWeb

         * Access log can be enabled for both read and write access

     * New Helix UI (helix-front) which allows you to :

         * View detailed cluster information

         * View resources / instances in a Helix cluster

         * View partition placement and health status in a resource

         * Create new Helix clusters

         * Enable / Disable a cluster / resource / instance

         * Add an instance into a Helix cluster

     * New Full-auto rebalancer featured with :

         * CRUSH-based rack-aware partition placement algorithm

         * Delayed rebalance which minimizes re-shuffle of the resident partitions on an instance during short-period outage, while still maintaining minimal active replicas for application's availability need

         * Throttled partition movements that allows maximum number of pending state transitions to be set at cluster, resource or instance level

         * New P2P (Participant-to-participant) state-transition message to allow much faster mastership handoff

     * Helix Cluster Maintenance Mode

         * Application can put a cluster into maintenance mode. When a cluster in maintenance mode, no new partitions will be bootstrapped, however, existing partitions will still be maintained

 ** Bug Fixes

     * Fixed NPE if rebalance strategy is not specified in IS.

     * Fixed a minor issue when updating a workflowConfig with empty workflowId.

     * Fixed MultiRound CRUSH that cannot select any node from second round

     * Fixed a bug in caching bestpossible states in ClusterDataCache.

     * Fixed MissingTopStatePartitionGauge reports negative number issue.

     * Fixed TaskStateModel thread leaking issue, and name all thread-pool created in Helix.

     * Fixed no master replica when all of replica in new instances turn to ERROR state when migrating existing replicas to all new instances in DelayedAutoRebalancer.

     * Fixed NPE in clusterstatusmonitor.

     * Fixed build the workflowconfig by fromHelixProperty

     * Fixed comparison method violating the contract

     * Fixed NPEs for HelixTask Executors

     * Fixed WorkflowConfig.Builder.fromMap() to copy WorkflowID

     * Fixed a bug where context could not be read due to missing NAME field

     * Fixed deadlock in GenericHelixController

     * Fixed active MBean domain names for different instance types.

     * Fixed duplication of HelixCallBackMonitors

     * Fixed a bug when deleting a job from queue without context

     * Fixed a NPE in DelayedAutoRebalancer, adding more debug logs.

     * Fixed unexpected idealstate overwrite when persist assignment is on.

     * Fixed disconnected zkConnection issue.

     * Fixed Resource json format issue.

     * Fixed ClusterConfig record output

     * Fixed resource config set path

     * Fixed rest JSON methods for restful service.

     * Fixed NPE for get disabled partitions

     * Fixed enable/disable partition in instances for specified resources

     * Fixed deleteJob from a recurrent job queue

     * Fixed a bug in AutoRebalanceStrategy, that assigned orphan replicas to its preferred nodes instead of random nodes.

     * Fixed a bug in BestPossibleExternalViewVerifier

     * Fixed Validation logic in JobConfig

     * Fixed Target Resource conflicts with number of tasks setting

     * Fixed JobQueue ignoring FailedThreshold

     * Fixed NPE when first time call WorkflowRebalancer

     * Fixed java 6 compatibility issue in AutoRebalancer.

     * Fixes for using 1.8 feature in 1.6 environment

     * Fixed BestPossibleExternalViewVerifier toString NPE

     * Fixed Task State Model INIT priority number

     * Fixed task assignment in instance group tag check

     * Do not set MaxPartitionPerNode in IdealState if it is not greater than 0.

     * Fixed missing workflowtype assignment in builder

     * Fixed NPE in ClusterStateVerifier

     * Fixed thread leaking problems in TaskStateModel. Using shared thread pool for all tasks and timeout tasks among all taskStateModels.

     * Fixed jobConfig to expose number_of_concurrent_task_per_instance to znode.

     * Fixed the bug when job expiry time is shorter than job schedule interval in recurring job queue. Add more debug log in TaskRebalancer.


 ** Improvements

     * Performance Improvement

         * Task framework rebalance pipeline was separated from resource management pipeline in the controller

         * Reads and writes to ZK are batched during controller's rebalance pipeline to decrease the latency

         * Optimized the rebalance pipeline with cached data to avoid redundant calculation

         * New P2P (Participant-to-participant) state-transition message to allow much faster mastership handoff

         * New target external view to allow spectators to  have a speculative view of ongoing rebalancing

     * Monitor Improvement

         * A set of new MBeans to monitor traffic and latency between controller/participant and Zookeeper

         * More MBeans to monitor performance of controller's rebalance pipeline

         * More MBeans to monitor running and queued workflows and jobs

     * Job scheduling should fail if the target resource does not exist anymore at the time of scheduling

     * Improved Integration test run speed

     * Added periodic rebalance to Helix controller

     * Upgraded Zookeeper dependency and fixed various zk connection and data update issues

     * Refactored the monitoring framework to simplify interfaces

     * Added monitor to ZkClient to monitor the pending callbacks

     * Improve the data load in Helix Spectator (RoutingTableProvider), which includes: 1) Put event callback handler in a spearate thread so other ZK event callbacks won't be blocked. 2) Deduplicate the callbacks from same event type, always keep just one latest copy of event callback in the event queue. 3) Add methods to return all instances and liveInstances in the cluster

     * Add cluster config to tolerate ERROR Partition when trying to schedule load balance transition

     * Optimize ClusterDataCache's data refresh strategy by: 1) cacheing CurrentStates locally and update only these that have been changed from zk. 2) Controller listens on ResourceConfig changes 3) Cache resource configs locally and update them all if there is any changes to resource configs

     * Avoid cascading failure by automatically disabling the cluster when too many partitions are crammed into an instance

     * Allow user to define a preference list for a partition in FULL-AUTO rebalance mode

     * Allow each individual change listener to selectively enable/disable PreFetch and BatchMode during the callback handling.

     * Added Cluster level state transition timeout

     * Persist participant's offline timestamp in ParticipantHistory

     * Added Serialization for HelixProperty

     * Added partition level priority support

     * Optimize partition movement when autorebalancing using the default strategy

     * Add support for flexible hirerachy representation of a cluster topology

     * Add StrictMatchExternalViewVerifier that verifies whether the ExternalViews of given resources (or all resources in the cluster) match exactly as its ideal mapping (in idealstate)

     * Improved Task Retry support

     * Generate Idealstate for a job resource only when it starts to run and remove it once the job is completed.


 []

 Cheers,
 --
 The Apache Helix Team
	-----
	Release Notes for Apache Helix 0.8.0
	-----

	~~ Licensed to the Apache Software Foundation (ASF) under one
	~~ or more contributor license agreements. See the NOTICE file
	~~ distributed with this work for additional information
	~~ regarding copyright ownership. The ASF licenses this file
	~~ to you under the Apache License, Version 2.0 (the
	~~ "License"); you may not use this file except in compliance
	~~ with the License. You may obtain a copy of the License at
	~~
	~~ http://www.apache.org/licenses/LICENSE-2.0
	~~
	~~ Unless required by applicable law or agreed to in writing,
	~~ software distributed under the License is distributed on an
	~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	~~ KIND, either express or implied. See the License for the
	~~ specific language governing permissions and limitations
	~~ under the License.

	~~ NOTE: For help with the syntax of this file, see:
	~~ http://maven.apache.org/guides/mini/guide-apt-format.html

	Release Notes for Apache Helix 0.8.0

	The Apache Helix team would like to announce the release of Apache Helix 0.8.0.

	This is the twelfth release under the Apache umbrella, and the eighth as a top-level project.

	Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

	* Automatic assignment of resource/partition to nodes

	* Node failure detection and recovery

	* Dynamic addition of Resources

	* Dynamic addition of nodes to the cluster

	* Pluggable distributed state machine to manage the state of a resource via state transitions

	* Automatic load balancing and throttling of transitions

	[]

	* Detailed Changes

	** New Features

	* Helix REST 2.0

	* All admin operations are exposed via restful API

	* Support all task framework API

	* Helix Rest 2.0 uses standard HTTP methods if possible, instead of customized command as in HelixAdminWeb

	* Access log can be enabled for both read and write access

	* New Helix UI (helix-front) which allows you to :

	* View detailed cluster information

	* View resources / instances in a Helix cluster

	* View partition placement and health status in a resource

	* Create new Helix clusters

	* Enable / Disable a cluster / resource / instance

	* Add an instance into a Helix cluster

	* New Full-auto rebalancer featured with :

	* CRUSH-based rack-aware partition placement algorithm

	* Delayed rebalance which minimizes re-shuffle of the resident partitions on an instance during short-period outage, while still maintaining minimal active replicas for application's availability need

	* Throttled partition movements that allows maximum number of pending state transitions to be set at cluster, resource or instance level

	* New P2P (Participant-to-participant) state-transition message to allow much faster mastership handoff

	* Helix Cluster Maintenance Mode

	* Application can put a cluster into maintenance mode. When a cluster in maintenance mode, no new partitions will be bootstrapped, however, existing partitions will still be maintained

	** Bug Fixes

	* Fixed NPE if rebalance strategy is not specified in IS.

	* Fixed a minor issue when updating a workflowConfig with empty workflowId.

	* Fixed MultiRound CRUSH that cannot select any node from second round

	* Fixed a bug in caching bestpossible states in ClusterDataCache.

	* Fixed MissingTopStatePartitionGauge reports negative number issue.

	* Fixed TaskStateModel thread leaking issue, and name all thread-pool created in Helix.

	* Fixed no master replica when all of replica in new instances turn to ERROR state when migrating existing replicas to all new instances in DelayedAutoRebalancer.

	* Fixed NPE in clusterstatusmonitor.

	* Fixed build the workflowconfig by fromHelixProperty

	* Fixed comparison method violating the contract

	* Fixed NPEs for HelixTask Executors

	* Fixed WorkflowConfig.Builder.fromMap() to copy WorkflowID

	* Fixed a bug where context could not be read due to missing NAME field

	* Fixed deadlock in GenericHelixController

	* Fixed active MBean domain names for different instance types.

	* Fixed duplication of HelixCallBackMonitors

	* Fixed a bug when deleting a job from queue without context

	* Fixed a NPE in DelayedAutoRebalancer, adding more debug logs.

	* Fixed unexpected idealstate overwrite when persist assignment is on.

	* Fixed disconnected zkConnection issue.

	* Fixed Resource json format issue.

	* Fixed ClusterConfig record output

	* Fixed resource config set path

	* Fixed rest JSON methods for restful service.

	* Fixed NPE for get disabled partitions

	* Fixed enable/disable partition in instances for specified resources

	* Fixed deleteJob from a recurrent job queue

	* Fixed a bug in AutoRebalanceStrategy, that assigned orphan replicas to its preferred nodes instead of random nodes.

	* Fixed a bug in BestPossibleExternalViewVerifier

	* Fixed Validation logic in JobConfig

	* Fixed Target Resource conflicts with number of tasks setting

	* Fixed JobQueue ignoring FailedThreshold

	* Fixed NPE when first time call WorkflowRebalancer

	* Fixed java 6 compatibility issue in AutoRebalancer.

	* Fixes for using 1.8 feature in 1.6 environment

	* Fixed BestPossibleExternalViewVerifier toString NPE

	* Fixed Task State Model INIT priority number

	* Fixed task assignment in instance group tag check

	* Do not set MaxPartitionPerNode in IdealState if it is not greater than 0.

	* Fixed missing workflowtype assignment in builder

	* Fixed NPE in ClusterStateVerifier

	* Fixed thread leaking problems in TaskStateModel. Using shared thread pool for all tasks and timeout tasks among all taskStateModels.

	* Fixed jobConfig to expose number_of_concurrent_task_per_instance to znode.

	* Fixed the bug when job expiry time is shorter than job schedule interval in recurring job queue. Add more debug log in TaskRebalancer.


	** Improvements

	* Performance Improvement

	* Task framework rebalance pipeline was separated from resource management pipeline in the controller

	* Reads and writes to ZK are batched during controller's rebalance pipeline to decrease the latency

	* Optimized the rebalance pipeline with cached data to avoid redundant calculation

	* New P2P (Participant-to-participant) state-transition message to allow much faster mastership handoff

	* New target external view to allow spectators to have a speculative view of ongoing rebalancing

	* Monitor Improvement

	* A set of new MBeans to monitor traffic and latency between controller/participant and Zookeeper

	* More MBeans to monitor performance of controller's rebalance pipeline

	* More MBeans to monitor running and queued workflows and jobs

	* Job scheduling should fail if the target resource does not exist anymore at the time of scheduling

	* Improved Integration test run speed

	* Added periodic rebalance to Helix controller

	* Upgraded Zookeeper dependency and fixed various zk connection and data update issues

	* Refactored the monitoring framework to simplify interfaces

	* Added monitor to ZkClient to monitor the pending callbacks

	* Improve the data load in Helix Spectator (RoutingTableProvider), which includes: 1) Put event callback handler in a spearate thread so other ZK event callbacks won't be blocked. 2) Deduplicate the callbacks from same event type, always keep just one latest copy of event callback in the event queue. 3) Add methods to return all instances and liveInstances in the cluster

	* Add cluster config to tolerate ERROR Partition when trying to schedule load balance transition

	* Optimize ClusterDataCache's data refresh strategy by: 1) cacheing CurrentStates locally and update only these that have been changed from zk. 2) Controller listens on ResourceConfig changes 3) Cache resource configs locally and update them all if there is any changes to resource configs

	* Avoid cascading failure by automatically disabling the cluster when too many partitions are crammed into an instance

	* Allow user to define a preference list for a partition in FULL-AUTO rebalance mode

	* Allow each individual change listener to selectively enable/disable PreFetch and BatchMode during the callback handling.

	* Added Cluster level state transition timeout

	* Persist participant's offline timestamp in ParticipantHistory

	* Added Serialization for HelixProperty

	* Added partition level priority support

	* Optimize partition movement when autorebalancing using the default strategy

	* Add support for flexible hirerachy representation of a cluster topology

	* Add StrictMatchExternalViewVerifier that verifies whether the ExternalViews of given resources (or all resources in the cluster) match exactly as its ideal mapping (in idealstate)

	* Improved Task Retry support

	* Generate Idealstate for a job resource only when it starts to run and remove it once the job is completed.


	[]

	Cheers,
	--
	The Apache Helix Team