docs/design/old-cluster-issues.txt - qpid-cpp - Git at Google

 -*-org-*-
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership.  The ASF licenses this file
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
 #
 #   http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.

 * Issues with the old design.

 The cluster is based on virtual synchrony: each broker multicasts
 events and the events from all brokers are serialized and delivered in
 the same order to each broker.

 In the current design raw byte buffers from client connections are
 multicast, serialized and delivered in the same order to each broker.

 Each broker has a replica of all queues, exchanges, bindings and also
 all connections & sessions from every broker. Cluster code treats the
 broker as a "black box", it "plays" the client data into the
 connection objects and assumes that by giving the same input, each
 broker will reach the same state.

 A new broker joining the cluster receives a snapshot of the current
 cluster state, and then follows the multicast conversation.

 ** Maintenance issues.

 The entire state of each broker is replicated to every member:
 connections, sessions, queues, messages, exchanges, management objects
 etc. Any discrepancy in the state that affects how messages are
 allocated to consumers can cause an inconsistency.

 - Entire broker state must be faithfully updated to new members.
 - Management model also has to be replicated.
 - All queues are replicated, can't have unreplicated queues (e.g. for management)

 Events that are not deterministically predictable from the client
 input data stream can cause inconsistencies. In particular use of
 timers/timestamps require cluster workarounds to synchronize.

 A member that encounters an error which is not encounted by all other
 members is considered inconsistent and will shut itself down. Such
 errors can come from any area of the broker code, e.g. different
 ACL files can cause inconsistent errors.

 The following areas required workarounds to work in a cluster:

 - Timers/timestamps in broker code: management, heartbeats, TTL
 - Security: cluster must replicate *after* decryption by security layer.
 - Management: not initially included in the replicated model, source of many inconsistencies.

 It is very easy for someone adding a feature or fixing a bug in the
 standalone broker to break the cluster by:
 - adding new state that needs to be replicated in cluster updates.
 - doing something in a timer or other non-connection thread.

 It's very hard to test for such breaks. We need a looser coupling
 and a more explicitly defined interface between cluster and standalone
 broker code.

 ** Performance issues.

 Virtual synchrony delivers all data from all clients in a single
 stream to each broker.  The cluster must play this data thru the full
 broker code stack: connections, sessions etc. in a single thread
 context in order to get identical behavior on each broker. The cluster
 has a pipelined design to get some concurrency but this is a severe
 limitation on scalability in multi-core hosts compared to the
 standalone broker which processes each connection in a separate thread
 context.
	--org--
	# Licensed to the Apache Software Foundation (ASF) under one
	# or more contributor license agreements. See the NOTICE file
	# distributed with this work for additional information
	# regarding copyright ownership. The ASF licenses this file
	# to you under the Apache License, Version 2.0 (the
	# "License"); you may not use this file except in compliance
	# with the License. You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing,
	# software distributed under the License is distributed on an
	# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	# KIND, either express or implied. See the License for the
	# specific language governing permissions and limitations
	# under the License.

	* Issues with the old design.

	The cluster is based on virtual synchrony: each broker multicasts
	events and the events from all brokers are serialized and delivered in
	the same order to each broker.

	In the current design raw byte buffers from client connections are
	multicast, serialized and delivered in the same order to each broker.

	Each broker has a replica of all queues, exchanges, bindings and also
	all connections & sessions from every broker. Cluster code treats the
	broker as a "black box", it "plays" the client data into the
	connection objects and assumes that by giving the same input, each
	broker will reach the same state.

	A new broker joining the cluster receives a snapshot of the current
	cluster state, and then follows the multicast conversation.

	** Maintenance issues.

	The entire state of each broker is replicated to every member:
	connections, sessions, queues, messages, exchanges, management objects
	etc. Any discrepancy in the state that affects how messages are
	allocated to consumers can cause an inconsistency.

	- Entire broker state must be faithfully updated to new members.
	- Management model also has to be replicated.
	- All queues are replicated, can't have unreplicated queues (e.g. for management)

	Events that are not deterministically predictable from the client
	input data stream can cause inconsistencies. In particular use of
	timers/timestamps require cluster workarounds to synchronize.

	A member that encounters an error which is not encounted by all other
	members is considered inconsistent and will shut itself down. Such
	errors can come from any area of the broker code, e.g. different
	ACL files can cause inconsistent errors.

	The following areas required workarounds to work in a cluster:

	- Timers/timestamps in broker code: management, heartbeats, TTL
	- Security: cluster must replicate after decryption by security layer.
	- Management: not initially included in the replicated model, source of many inconsistencies.

	It is very easy for someone adding a feature or fixing a bug in the
	standalone broker to break the cluster by:
	- adding new state that needs to be replicated in cluster updates.
	- doing something in a timer or other non-connection thread.

	It's very hard to test for such breaks. We need a looser coupling
	and a more explicitly defined interface between cluster and standalone
	broker code.

	** Performance issues.

	Virtual synchrony delivers all data from all clients in a single
	stream to each broker. The cluster must play this data thru the full
	broker code stack: connections, sessions etc. in a single thread
	context in order to get identical behavior on each broker. The cluster
	has a pipelined design to get some concurrency but this is a severe
	limitation on scalability in multi-core hosts compared to the
	standalone broker which processes each connection in a separate thread
	context.