| -*-org-*- |
| # Licensed to the Apache Software Foundation (ASF) under one |
| # or more contributor license agreements. See the NOTICE file |
| # distributed with this work for additional information |
| # regarding copyright ownership. The ASF licenses this file |
| # to you under the Apache License, Version 2.0 (the |
| # "License"); you may not use this file except in compliance |
| # with the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, |
| # software distributed under the License is distributed on an |
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| # KIND, either express or implied. See the License for the |
| # specific language governing permissions and limitations |
| # under the License. |
| |
| * Issues with the old design. |
| |
| The cluster is based on virtual synchrony: each broker multicasts |
| events and the events from all brokers are serialized and delivered in |
| the same order to each broker. |
| |
| In the current design raw byte buffers from client connections are |
| multicast, serialized and delivered in the same order to each broker. |
| |
| Each broker has a replica of all queues, exchanges, bindings and also |
| all connections & sessions from every broker. Cluster code treats the |
| broker as a "black box", it "plays" the client data into the |
| connection objects and assumes that by giving the same input, each |
| broker will reach the same state. |
| |
| A new broker joining the cluster receives a snapshot of the current |
| cluster state, and then follows the multicast conversation. |
| |
| ** Maintenance issues. |
| |
| The entire state of each broker is replicated to every member: |
| connections, sessions, queues, messages, exchanges, management objects |
| etc. Any discrepancy in the state that affects how messages are |
| allocated to consumers can cause an inconsistency. |
| |
| - Entire broker state must be faithfully updated to new members. |
| - Management model also has to be replicated. |
| - All queues are replicated, can't have unreplicated queues (e.g. for management) |
| |
| Events that are not deterministically predictable from the client |
| input data stream can cause inconsistencies. In particular use of |
| timers/timestamps require cluster workarounds to synchronize. |
| |
| A member that encounters an error which is not encounted by all other |
| members is considered inconsistent and will shut itself down. Such |
| errors can come from any area of the broker code, e.g. different |
| ACL files can cause inconsistent errors. |
| |
| The following areas required workarounds to work in a cluster: |
| |
| - Timers/timestamps in broker code: management, heartbeats, TTL |
| - Security: cluster must replicate *after* decryption by security layer. |
| - Management: not initially included in the replicated model, source of many inconsistencies. |
| |
| It is very easy for someone adding a feature or fixing a bug in the |
| standalone broker to break the cluster by: |
| - adding new state that needs to be replicated in cluster updates. |
| - doing something in a timer or other non-connection thread. |
| |
| It's very hard to test for such breaks. We need a looser coupling |
| and a more explicitly defined interface between cluster and standalone |
| broker code. |
| |
| ** Performance issues. |
| |
| Virtual synchrony delivers all data from all clients in a single |
| stream to each broker. The cluster must play this data thru the full |
| broker code stack: connections, sessions etc. in a single thread |
| context in order to get identical behavior on each broker. The cluster |
| has a pipelined design to get some concurrency but this is a severe |
| limitation on scalability in multi-core hosts compared to the |
| standalone broker which processes each connection in a separate thread |
| context. |