| Title: Failover |
| <a name="Failover-Overview"></a> |
| # Overview |
| |
| OpenEJB supports stateless failover. Specifically, the ability for an EJB |
| client to failover from one server to the next if a request cannot be |
| completed. No application state information is communicated between the |
| servers, so this functionality should be used only with applications that |
| are inherently stateless. A common term for this sort of setup is a server |
| farm. |
| |
| The basic design assumption is that all servers in the same group have the |
| same applications deployed and are capable of doing the same job. Servers |
| can be brought online and offline while clients are running. As members |
| join/leave this information is sent to the client as part of normal EJB |
| request/response communication so active clients always have the most |
| current information on servers that can process their request should |
| communication with a particular server fail. |
| |
| <a name="Failover-Failover"></a> |
| ## Failover |
| |
| On each request to the server, the client will send the version number |
| associated with the list of servers in the cluster it is aware of. |
| Initially this version will be zero and the list will be empty. Only when |
| the server sees the client has an old list will the server send the updated |
| list. This is an important distinction as the list is not transmitted back |
| and forth on every request, only on change. If the membership of the |
| cluster is stable there is essentially no clustering overhead to the |
| protocol -- 8 byte overhead to each request and 1 byte on each response -- |
| so you will *not* see an exponential slowdown in response times the more |
| members are added to the cluster. This new list takes affect for all |
| proxies that share the same connection. |
| |
| When a server shuts down, more connections are refused, existing |
| connections not in mid-request are closed, any remaining connections are |
| closed immediately after completion of the request in progress and clients |
| can failover gracefully to the next server in the list. If a server |
| crashes requests are retried on the next server in the list (or depending |
| on the ConnectionStrategy). This failover pattern is followed until there |
| are no more servers in the list at which point the client attempts a final |
| multicast search (if it was created with a multicast PROVIDER_URL) before |
| abandoning the request and throwing an exception to the caller. |
| |
| By default, the failover is ordered but random selection is supported. The |
| multicast discovery aspect of the client adds a nice randomness to the |
| selection of the first server. |
| |
| <a name="Failover-Discovery"></a> |
| ## Discovery |
| |
| Each discoverable service has a URI which is broadcast as a heartbeat to |
| other servers in the cluster. This URI advertises the service's type, its |
| cluster group, and its location in the format of 'group:type:location'. |
| Say for example "cluster1:ejb:ejbd://thehost:4201". The URI is sent out |
| repeatedly in a pulse and its presence on the network indicates its |
| availability and its absence indicates the service is no longer available. |
| |
| The sending of this pulse (the heartbeat) can be done via UDP or TCP: |
| multicast and "multipoint" respectively. More on that in the following |
| section. The rate at which the heartbeat is pulsed to the network can be |
| specified via the 'heart_rate' property. The default is 500 milliseconds. |
| This rate is also used when listening for services on the network. If a |
| service goes missing for the duration of 'heart_rate' multiplied by |
| 'max_missed_heartbeats', then the service is considered dead. |
| |
| The 'group' property, cluster1 in the example, is used to dissect the |
| servers on the network into smaller logical clusters. A given server will |
| broadcast all it's services with the group prefixed in the URI, as well it |
| will ignore any services it sees broadcast if they do not share the same |
| group name. |
| |
| <a name="Failover-Multicast(UDP)"></a> |
| # Multicast (UDP) |
| |
| Multicast is the preferred way to broadcast the heartbeat on the network. |
| The simple technique of broadcasting a non-changing service URI on the |
| network has specific advantages to multicast. The URI itself is |
| essentially stateless and there is no "i'm alive" URI or an "i'm dead" URI. |
| |
| In this way the issues with UDP being unordered and unreliable melt away as |
| state is no longer a concern and packet sizes are always small. |
| Complicated libraries that ride atop UDP and attempt to offer reliability |
| (retransmission) and ordering on UDP can be avoided. As well the |
| advantages UDP has over TCP are retained as there are no java layers |
| attempting to force UDP communication to be more TCP-like. The simple |
| design means UDP/Multicast is only used for discovery and from there on out |
| critical information is transmitted over TCP/IP which is obviously going to |
| do a better job at ensuring reliability and ordering. |
| |
| <a name="Failover-ServerConfiguration"></a> |
| ## Server Configuration |
| |
| When you boot the server there should be a conf/multicast.properties file |
| containing: |
| |
| |
| server = org.apache.openejb.server.discovery.MulticastDiscoveryAgent |
| bind = 239.255.2.3 |
| port = 6142 |
| disabled = true |
| group = default |
| |
| |
| Just need to enable that by setting 'disabled=false'. All of the above |
| settings except *server* can be changed. The *port* and *bind* must |
| be valid for general multicast/udp network communication. |
| |
| The *group* setting can be changed to further group servers that may use |
| the same multicast channel. As shown below the client also has a *group* |
| setting which can be used to select an appropriate server from the |
| multicast channel. |
| |
| <a name="Failover-MulticastClient"></a> |
| ## Multicast Client |
| |
| The multicast functionality is not just for servers to find each other in a |
| cluster, it can also be used for EJB clients to discover a server. A |
| special "multicast://" URL can be used in the InitialContext properties to |
| signify that multicast should be used to seed the connection process. Such |
| as: |
| |
| Properties p = new Properties(); |
| p.put(Context.INITIAL_CONTEXT_FACTORY, |
| "org.apache.openejb.client.RemoteInitialContextFactory"); |
| p.put(Context.PROVIDER_URL, "multicast://239.255.2.3:6142?group=default"); |
| InitialContext remoteContext = new InitialContext(p); |
| |
| The URL has optional query parameters such as "schemes" and "group" and |
| "timeout" which allow you to zero in on a particular type of service of a |
| particular cluster group as well as set how long you are willing to wait in |
| the discovery process till finally giving up. The first matching service |
| that it sees "flowing" around on the UDP stream is the one it picks and |
| sticks to for that and subsequent requests, ensuring UDP is only used when |
| there are no other servers to talk to. |
| |
| Note that EJB clients do not need to use multicast to find a server. If |
| the client knows the URL of a server in the cluster, it may use it and |
| connect directly to that server, at which point that server will share the |
| full list of its peers. |
| |
| ## Multicast Servers with TCP Clients |
| |
| Note that clients do not need to use multicast to communicate with servers. |
| Servers can use multicast to discover each other, but clients are still |
| free to connect to servers in the network using the server's TCP address. |
| |
| Properties p = new Properties(); |
| p.put(Context.INITIAL_CONTEXT_FACTORY, "org.apache.openejb.client.RemoteInitialContextFactory"); |
| p.put(Context.PROVIDER_URL, "ejbd://192.168.1.30:4201"); |
| InitialContext remoteContext = new InitialContext(p); |
| |
| When the client connects, the server will send the URLs of all the servers |
| in the group and failover will take place normally. |
| |
| <a name="Failover-Multipoint(TCP)"></a> |
| # Multipoint (TCP) |
| |
| As TCP has no real broadcast functionality to speak of, communication of |
| who is in the network is achieved by each server having a physical |
| connection to each other server in the network. |
| |
| To join the network, the server must be configured to know the address of |
| at least one server in the network and connect to it. When it does both |
| servers will exchange the full list of all the other servers each knows |
| about. Each server will then connect to any new servers they've just |
| learned about and repeat the processes with those new servers. The end |
| result is that everyone has a direct connection to everyone 100% of the |
| time, hence the made-up term "multipoint" to describe this situation of |
| each server having multiple point-to-point connections which create a fully |
| connected graph. |
| |
| On the client side things are similar. It needs to know the address of at |
| least one server in the network and be able to connect to it. When it does |
| it will get the full (and dynamically maintained) list of every server in |
| the network. The client doesn't connect to each of those servers |
| immediately, but rather consults the list in the event of a failover, using |
| it to decide who to connect to next. |
| |
| The entire process is essentially the art of using a statically maintained |
| list to bootstrap getting the more valuable dynamically maintained list. |
| |
| {div:style=clear:both;}{div} |
| |
| <a name="Failover-ServerConfiguration"></a> |
| ## Server Configuration |
| |
| |
| In the server this list can be specified via the |
| `conf/multipoint.properties` file like so: |
| |
| server = org.apache.openejb.server.discovery.MultipointDiscoveryAgent |
| bind = 127.0.0.1 |
| port = 4212 |
| disabled = false |
| initialServers = 192.168.1.20:4212, 192.168.1.30:4212, 192.168.1.40:4212 |
| |
| |
| The above configuration shows the server has an port 4212 open for |
| connections by other servers for multipoint communication. The |
| `initialServers` list should be a comma separated list of other similar |
| servers on the network. Only one of the servers listed is required to be |
| running when this server starts up -- it is not required to list all |
| servers in the network. |
| |
| <a name="Failover-ClientConfiguration"></a> |
| ## Client Configuration |
| |
| Configuration in the client is similar, but note that EJB clients do not |
| participate directly in multipoint communication and do *not* connect to |
| the multipoint port. The server list is simply a list of the regular |
| "ejbd://" urls that a client normally uses to connect to a server. |
| |
| Properties p = new Properties(); |
| p.put(Context.INITIAL_CONTEXT_FACTORY, "org.apache.openejb.client.RemoteInitialContextFactory"); |
| p.put(Context.PROVIDER_URL, "failover:ejbd://192.168.1.20:4201,ejbd://192.168.1.30:4201"); |
| InitialContext remoteContext = new InitialContext(p); |
| |
| |
| ## Considerations |
| |
| ### Network size |
| |
| The general disadvantage of this topology is the number of connections |
| required. The number of connections for the network of servers is equal to |
| "(n * n - n) / 2 ", where n is the number of servers. For example, with 5 |
| servers you need 10 connections, with 10 servers you need 45 connections, |
| and with 50 servers you need 1225 connections. This is of course the |
| number of connections across the entire network, each individual server |
| only needs "n - 1" connections. |
| |
| The handling of these sockets is all asynchronous Java NIO code which |
| allows the server to handle many connections (all of them) with one thread. |
| From a pure threading perspective, the option is extremely efficient with |
| just one thread to listen and broadcast to many peers. |
| |
| ### Double connect |
| |
| It is possible in this process that two servers learn of each other at the |
| same time and each attempts to connect to the other simultaneously, |
| resulting in two connections between the same two servers. When this |
| happens both servers will detect the extra connection and one of the |
| connections will be dropped and one will be kept. In practice this race |
| condition rarely happens and can be avoided almost entirely by fanning out |
| server startup by as little as 100 milliseconds. |
| |
| # Multipoint Configuration Recommendations |
| |
| As mentioned above the {{initialServers}} is only used for bootstrapping |
| the multipoint network. Once running, all servers will dynamically |
| establish direct connections with each other and there is no single point |
| of failure. |
| |
| However to ensure that the bootstrapping process can occur successfully, |
| the {{initialServers}} property of the {{conf/multipoint.properties}} file |
| must be set carefully and with a specific server start order in mind. Each |
| server consults its {{initialServers}} list exactly once in the |
| bootstrapping phase at startup, after that time connections are made |
| dynamically. |
| |
| This means that at least one of the servers listed in {{initialServers}} |
| must already be running when the server starts or the server might never |
| become introduced and connected to all the other servers in the network. |
| |
| ## Failed scenario (background) |
| |
| As an example of a failed scenario, imagine there are three servers; |
| server1, server2, server3. They are setup only to point to the server in |
| front of them making a chain: |
| |
| * server1; initialServers = server2 |
| * server2; initialServers = server3 |
| * server3; initialServers = <blank> |
| |
| Which is essentially server1 -> server2 -> server3. This scenario could |
| work, but they servers would have to be started in exactly the opposite |
| order: |
| |
| # server3 starts |
| # server2 starts |
| #* static: connect to server3 |
| # server1 starts |
| #* static: connect to server2 |
| #* dynamic: connect to server3 |
| |
| At this point all servers would be fully connected. But the above setup is |
| flawed and could easily fail. The first flaw is server3 lists nothing in |
| its {{initialServers}} list, so if it were restarted it would leave the |
| multipoint network and not know how to get back in. |
| |
| The second flaw is if you started them in any other order, you would also |
| not get a fully connected multipoint network. Say the servers were started |
| in "front" order: |
| |
| # server1 starts |
| #* static: connect to server2 - failed, server2 not started. |
| # server2 starts |
| #* static: connect to server3 - failed, server3 not started. |
| # server3 starts |
| #* no connection attempts, initialServers list is empty. |
| |
| After startup completes, all servers will be completely isolated and |
| failover will not work. The described setup is weaker than it needs to be. |
| Listing just one server means the listed server is a potential point of |
| weakness. As a matter of trivia, it is interesting to point out that you |
| could bring a fourth server online temporarily that lists all three |
| servers. Once it makes the introductions and all servers learn of each |
| other, you could shut it down again. |
| |
| The above setup is easily fixable via better configuration. If server3 |
| listed both server1 and server2 in its initialServers list, rather than |
| listing nothing at all, then all servers would fully discover each other |
| regardless of startup order; assuming all three servers did eventually |
| start. |
| |
| ## Bootstrapping Three Servers or Less |
| |
| In a three sever scenario, we recommend simply having all three servers |
| list all three servers. |
| |
| * server1/conf/multipoint.properties |
| ** initialServers = server1, server2, server3 |
| * server2/conf/multipoint.properties |
| ** initialServers = server1, server2, server3 |
| * server3/conf/multipoint.properties |
| ** initialServers = server1, server2, server3 |
| |
| There's no harm to a server listing itself. It gives you one clean list to |
| maintain and it will work even if you decide not to start one of the three |
| servers. |
| |
| ## Bootstrapping Four Servers or More |
| |
| In a scenario of four or more, we recommend picking at least to servers and |
| focus on always keeping at least one of them running. Lets refer to them |
| as "root" servers for simplicity sake. |
| |
| * server1/conf/multipoint.properties |
| ** initialServers = server2 |
| * server2/conf/multipoint.properties |
| ** initialServers = server1 |
| |
| Root server1 would list root server2 so they would always be linked to each |
| other regardless of start order or if one of them went down. Server1 could |
| be shutdown and reconnect on startup to the full multipoint network through |
| server2, and vice versa. |
| |
| All other servers would simply list the root servers (server1, server2) in |
| their initialServers list. |
| |
| * server3/conf/multipoint.properties |
| ** initialServers = server1, server2 |
| * server4/conf/multipoint.properties |
| ** initialServers = server1, server2 |
| * serverN/conf/multipoint.properties |
| ** initialServers = server1, server2 |
| |
| As long as at least one root server (server1 or server2) was running, you |
| can bring other servers on and offline at will and always have a fully |
| connected graph. |
| |
| Of course all servers once running and connected will have a full list of |
| all other servers in the network, so if at any time the "root" servers |
| weren't around to make initial introductions to new servers it would be no |
| trouble. It's possible to reconfigure new servers to point at any other |
| server in the network as all servers will have the full list. So these |
| "root" servers are no real point of failure in function, but only of |
| convenience. |
| |
| ## Command line overrides |
| |
| Always remember that any property in a conf/<server-service>.properties |
| file can be overridden on the command line or via system properties. So it |
| is possible easily set the initialServers list in startup scripts. |
| |
| A bash example might look something like: |
| |
| !/bin/bash |
| |
| OPENEJB_HOME=/opt/openejb-3.1.3 |
| INITIAL_LIST=$(cat /some/shared/directory/our_initial_servers.txt) |
| |
| $OPENEJB_HOME/bin/openejb start -Dmultipoint.initialServers=$INITIAL_LIST |
| |
| |