blob: a4cdcc52588c8e2ae4aa8f66d8e029fd675bbf3f [file] [log] [blame]
To Do:
===========================================
Documentation:
===========================================
Why is tribes unique compared to JGroups/Appia and other group comm protocols
1. Uses NIO and TCP for guaranteed delivery and the ability to join large groups
2. Guarantees messages the following way
a) TCP messaging, with a following READ for NIO to ensure non broken channel
b) ACK messages from the receiver
c) ACK after processing
3. Same (single) channel can handle all types of guarantees (a,b,c) at the same time
and both process synchronous and asynchronous messaging.
This is key to support different requirements for messaging through
the same channel to save system resources.
4. For async messaging, errors are reported through an error handler, callback
5. Ability to send on multiple streams at the same time, in parallel, to improve performance
6. Designed with replication in mind, that some pieces of data don't need to be totally ordered.
7. Its not built with the uniform group model in mind, but it can be accomplished using interceptors.
8. Future version will have WAN membership and replication
9. Silent members, the ability to send messages to a node not in the membership
10. Sender burst, concurrent messaging between two or more nodes
11. Multicasting can still be done on a per message basis using a MulticastInterceptor
Bugs:
===========================================
a) Somehow the first NIO connection made, always closes down, why
b) pull the network cord and watch the membership layer freak out
c) AbstractReplicatedMap.size() throws ConcurrentModificationException,
implement a size counter instead
Code Tasks:
===========================================
51. NioSender.setData should not expand the byte buffer if its too large
instead just refill it from the XByteBuffer
50. On top of versioning, implement version syncs from primary to backup
Or when a backup receives an update that is out of sync
49. Implement versioning on the AbstractReplicatedMap
48. Periodic refresh of the replicated map (primary ->backup)
47. Delta(session) versioning. increase version number each time, easier to keep maps in sync
41. Build a tipi that is a soft membership
38. Make the AbstractReplicatedMap accept non serializable elements, but just don't replicate them
36. UDP Sender and Receiver, initially without flow control and guaranteed delivery.
This can be easily done as an interceptor, and operate in parallel with the TCP sender.
It can implement an auto detect, so that if the message is going to a destination in the same network
patch, then send it over UDP.
35. The ability to share one channel amongst multiple processes
32. Replicated JNDI entries in Tomcat in the format
cluster:<map name>/<entry key> for example
cluster:myapps/db/shared/dbinfo
31. A layer on top of the GroupChannel, to allow multiple processes share
a channel to send/receive message to conserve system resources - this is way in the future.
30. CookieBasedReplicationMap - a very simple extension to the LazyReplicatedMap
but instead of randomly selecting a backup node and then publishing the PROXY to all
the other nodes in the group, this will simply
read/write a cookie, for a backup location, so the nodes will
never know until the request comes in.
This is useful in extremely large clusters, and essentially reduces
very much of the network chatter, this task is dependent on task25
Question to be answered: How does the map know of the cookie?
Potential answer: Use a thread local and bind the request/response to it
Question: is there one cookie per map, or one cookie per server.
Potential answer: One cookie per app allows for heterogenous instances
is there a limit on number of cookies
per netscape and RFC 2109, section 6.3,, its 20
http://wp.netscape.com/newsref/std/cookie_spec.html
To work around the cookie limit problem there would be a suggested setting
of storing the preferred backup order, or simply require homogeneous instances.
29. Thread pool, shrink dynamically
27. XmlConfigurator - read an XML file to configure the channel.
26. JNDIChannel - a way to bind the group channel in a JNDI tree,
so that shared resources can access it.
23. TotalOrderInterceptor - fairly straight forward implementation
This interceptor would depend on the fact that there is some sort of
membership coordinator, see task 9.
Once there is a coordinator in the group, the total order protocol is the same
as the OrderInterceptor, except that it gets its message number from
the coordinator, prior to sending it out.
The TotalOrderInterceptor, will keep a order number per member,
this way, ordering is kept intact when different messages are sent
two different members, ie
Message A - all members - total order (mbrA-2, mbrB-2, mbrC-2, mbrD-2)
Message B - mbrC,mbrD only - total order (mbrA-2, mbrB-2, mbrC-3, mbrD-3)
- The combination of Member uniqueId,orderId is unique, nothing else
this way, if a member crashes, we don't hold the queue, instead we start over.
- A TotalOrder token, will contain the coordinator uniqueId as well.
- One parameter should be "receive sequence timeout" incase the coordinator is not responding.
- OPTION A)
the coordinator doesn't forward the message
since the app will not receive the proper error message,
instead the sequencer just returns the sequence, then the member itself sends the message
pros: the app will find out if the send failed/succeeded
cons: if the send fails, the sequencer is out of sync for the failed member
OPTION B)
The coordinator, receives the message, adds on the sequence number
then sends the message on behalf of the requesting members
pros: sequencer is in charge of the sequence
cons: the sequence can become overloaded, since it has to do all the trafficing
the requesting member will not know if the message failed/succeeded
OPTION C) Research papers on total order, better algorithms exist.
NOTES! THIS CANT BE DONE USING A SEQUENCER THAT SENDS THE MESSAGE SINCE WE
LOSE THE ABILITY TO REPORT FEEDBACK
21. Implement a WAN membership layer, using a WANMbrInterceptor and a
WAN Router/Forwarder (Tipi on top of a ManagedChannel)
20. Implement a TCP membership interceptor, for guaranteed functionality, not just discovery
18. Implement SSL encryption over message transfers, BIO and NIO
8. WaitForCompletionInterceptor - waits for the message to get processed by all receivers before returning
(This is useful when synchronized=false and waitForAck=false, to improve
parallel processing, but you want to have all messages sent in parallel and
don't return until all have been processed on the remote end.)
11. Code a ReplicatedFileSystem example, package org.apache.catalina.tipis
13. StateTransfer interceptor
the ideas just come up in my head. the state transfer interceptor
will hold all incoming messages until it has received a message
with a STATE_TRANSFER header as the first of the bytes.
Once it has received state, it will pretty much take itself out of the loop
The benefit of the new ParallelNioSender is that it doesn't require to know about
a member to transfer state, all it has to do is to reply to a message that came in.
State is a one time deal for the entire channel, so a
session replication cluster, would transfer state as one block, not one per context
14. Keepalive count and idle kill off for Nio senders
17. Implement transactions - the ability to start a transaction, send several messages,
and then commit the transaction
Tasks Completed
===========================================
1. True synchronized/asynchronized replication enabled using flags
Sender.sendAck/Receiver.waitForAck/Receiver.synchronized
Task Desc: waitForAck - should only mean, we received the message, not for the
message to get processesed. This should improve throughput, and an interceptor
can do waitForCompletion
Status: Complete
Notes:
2. Unique id, send it in byte array instead of string
3. DataSender or ReplicationTransmitter swallows IOException, this should be
Notes: This has only been fixed for the pooled synchronized. the fastasynch
aint working that well
4. ChannelMessage.getMessage should return streamable, that way we can wrap,
pass it around and all those good things without having to copy byte arrays
left and right
Notes: Instead of using a streamable, this is implemented using the XByteBuffer,
which is very easy to use. It also becomes a single spot for optimizations.
Ideally, there would be a pool of XByteBuffers, that all use direct ByteBuffers
for its data handling.
5. OrderInterceptor - guarantees the order of messages
Notes: completed
6. NIO and IO DataSender, since the IO is blocking
Notes: completed. works very well, have not implemented suspect error logging.
7. FragmentationInterceptor - splits up messages that are larger than X bytes.
Notes: complated
15. remove DataSenderFactory and DataSender.properties -
these cause the settings to be hard coded ant not pluggable.
Notes: Completed, now you can initialize a transport class
12. LazyReplicatedHashMap - memory efficient clustered map.
This map can be used for PRIMARY/SECONDARY session replication
Ahh, the beauty of storing data in remote locations
The lazy hash map will only replicate its attribute names to all members in the group
with that name, it will also replicate the source (where to get the object)
and the backup member where it can find a backup if the source is gone.
If the source disappears, the backup node will replicate attributes that
are stored to a new primary backups can be chosen on round robin.
When a new member arrives and requests state, that member will get all the attribute
names and the locations.
It can replicate every X seconds, or on dirty flags by the objects stored,
or a request to scan for dirty flags, or a request with the objects.
Notes: the map has been completed
22. sendAck and synchronized should not have to be a XML config,
it can be configured on a per packet basis using ClusterData.getOptions()
Notes: see Channel.SEND_OPT_XXXX variables
28. Thread pool should have maxThreads and minThreads and grow dynamically
24. MessageDispatchInterceptor - for asynchronous sending
- looks at the options flag SEND_OPTIONS_ASYNCHRONOUS
- has two modes
a) async parallel send - each message to all destinations before next message
b) async per/member - one thread per member using the FastAsyncQueue (good for groups with slow receivers)
- Callback error handler - for when messages fail, and the application wishes to become notified
- MUST HAVE A LIMIT QUEUE SIZE IN MB, to avoid OOM errors or persist the queue.
- MUST USE ClusterData.deepclone() to ensure thread safety if ClusterData objects get recycled
Notes: Simple implementation, one thread, invokes all senders in parallel.
Deep cloning is configurable as optimization.
37. Interceptor.getOptionFlag() - lets the system configure a flag to be used
for the interceptor. that way, all constants don't have to be configured
in Channel.SEND_FLAG_XXXX.
Also, the GroupChannel will make a conflict check upon startup,
so that there is no conflict. I will change options to a long,
so that we can have 63 flags, hence up to 60 interceptors.
Notes: Completed, remained an int, so 31 flags
b) State synchronization for the map - will need to add in MSG_INIT
Fixed map bug
c) RpcChannel - collect "no reply" replies, so that we don't have to time out
The RpcChannel now works together with the group channel, so that when it receives an RPC message
and no one accepts it, then it can reply immediately. this way the rpc sender doesn't have to time out.
39. Support for IPv6
Notes: Completed. The membership now carries a variable length host address to support IPv6
40. channel.stop() - should broadcast a stop message, to avoid timeout
Notes: Completed.
42. When the option SEND_OPTIONS_SYNCHRONIZED_ACK, and an error happens during
processing on the remote node, a FAIL_ACK command should be sent
so that the sending node doesn't have to time out
Notes: Completed. The error will be wrapped in a ChannelException and marked as a
RemoteProcessException for that node.
To receive this exception, one must turn on
43. Silent member, node discovery.
Add in the ability to start up tribes, but don't start the membership broadcast
component, only the listener
Notes: Completed. added in correct startup sequences.
46. Heartbeat Interface, to notify listeners as well
Notes: Implemented
44. Soft membership failure detection, ie if a webapp is stopped, but
the AbstractReplicatedMap doesn't broadcast a stop message
This is one potential solution:
1. keep a static WeakHashMap of all map implementations running
so that we can share one heartbeat thread for timeouts
2. everytime a message is received, update the last check time for that
member so that we don't need the thread to actively check
3. when the thread wakes up, it will check maps that are outside
the valid range for check time,
4. send a RPC message, if no reply, remove the map from itself
Other solution, use the TcpFailureDetector, catch send errors
Notes: Implemented using a periodic ping in the AbstractReplicatedMap
45. McastServiceImpl.receive should have a SO_TIMEOUT so that we can check
for members dropping on the same thread
Notes: Completed
33. TcpFailureDetector, when a member is reported missing, first check TCP path too.
Notes: Completed
34. Configurable payload for the membership heartbeat, so that the app can decide what to heartbeat.
such as JMX management port, ala Andy Piper's suggestion.
Notes: Completed
49. Make sure that membership is established before the start process returns.
Notes: Completed
16. Guaranteed delivery of messages, ie either all get it or none get it.
Meaning, that all receivers get it, then wait for a process command.
ala Gossip protocol - this is fairly redundant with a Xa2PhaseCommitInterceptor
except it doesn't keep a transaction log.
Notes: Completed in another task
10. Xa2PhaseCommitInterceptor - make sure the message doesn't reach the receiver unless all members got it
Notes: Completed
25. Member.uniqueId - 16 bytes unique for a member, UUID
Needed to not confuse a crashed member with a revived member on the same port
Notes: Completed
19. Implement a hardcoded tcp membership
Notes: Completed
9. CoordinatorInterceptor - manages the selection of a cluster coordinator
just had a brilliant idea, if GroupChannel keeps its own view of members,
the coordinator interceptor can hold on to the member added/disappared event
It can also intercept down going messages if the coordinator disappeared
while a new coordinator is chosen
It can also intercept down going messages for members disappeared that the
calling app not yet knows about, to avoid a ChannelException
The coordinator is needed because of the mixup when two channels startup instantly
Notes: Completed. org.apache.catalina.tribes.group.interceptors.NonBlockingCoordinatorInterceptor
implements an automerging algorithm.