| To Do: |
| =========================================== |
| |
| Documentation: |
| =========================================== |
| Why is tribes unique compared to JGroups/Appia and other group comm protocols |
| |
| 1. Uses NIO and TCP for guaranteed delivery and the ability to join large groups |
| |
| 2. Guarantees messages the following way |
| a) TCP messaging, with a following READ for NIO to ensure non broken channel |
| b) ACK messages from the receiver |
| c) ACK after processing |
| |
| 3. Same (single) channel can handle all types of guarantees (a,b,c) at the same time |
| and both process synchronous and asynchronous messaging. |
| This is key to support different requirements for messaging through |
| the same channel to save system resources. |
| |
| 4. For async messaging, errors are reported through an error handler, callback |
| |
| 5. Ability to send on multiple streams at the same time, in parallel, to improve performance |
| |
| 6. Designed with replication in mind, that some pieces of data don't need to be totally ordered. |
| |
| 7. Its not built with the uniform group model in mind, but it can be accomplished using interceptors. |
| |
| 8. Future version will have WAN membership and replication |
| |
| 9. Silent members, the ability to send messages to a node not in the membership |
| |
| 10. Sender burst, concurrent messaging between two or more nodes |
| |
| 11. Multicasting can still be done on a per message basis using a MulticastInterceptor |
| |
| Bugs: |
| =========================================== |
| a) Somehow the first NIO connection made, always closes down, why |
| |
| b) pull the network cord and watch the membership layer freak out |
| |
| c) AbstractReplicatedMap.size() throws ConcurrentModificationException, |
| implement a size counter instead |
| |
| Code Tasks: |
| =========================================== |
| 51. NioSender.setData should not expand the byte buffer if its too large |
| instead just refill it from the XByteBuffer |
| |
| 50. On top of versioning, implement version syncs from primary to backup |
| Or when a backup receives an update that is out of sync |
| |
| 49. Implement versioning on the AbstractReplicatedMap |
| |
| 48. Periodic refresh of the replicated map (primary ->backup) |
| |
| 47. Delta(session) versioning. increase version number each time, easier to keep maps in sync |
| |
| 41. Build a tipi that is a soft membership |
| |
| 38. Make the AbstractReplicatedMap accept non serializable elements, but just don't replicate them |
| |
| 36. UDP Sender and Receiver, initially without flow control and guaranteed delivery. |
| This can be easily done as an interceptor, and operate in parallel with the TCP sender. |
| It can implement an auto detect, so that if the message is going to a destination in the same network |
| patch, then send it over UDP. |
| |
| 35. The ability to share one channel amongst multiple processes |
| |
| 32. Replicated JNDI entries in Tomcat in the format |
| cluster:<map name>/<entry key> for example |
| cluster:myapps/db/shared/dbinfo |
| |
| 31. A layer on top of the GroupChannel, to allow multiple processes share |
| a channel to send/receive message to conserve system resources - this is way in the future. |
| |
| 30. CookieBasedReplicationMap - a very simple extension to the LazyReplicatedMap |
| but instead of randomly selecting a backup node and then publishing the PROXY to all |
| the other nodes in the group, this will simply |
| read/write a cookie, for a backup location, so the nodes will |
| never know until the request comes in. |
| This is useful in extremely large clusters, and essentially reduces |
| very much of the network chatter, this task is dependent on task25 |
| Question to be answered: How does the map know of the cookie? |
| Potential answer: Use a thread local and bind the request/response to it |
| Question: is there one cookie per map, or one cookie per server. |
| Potential answer: One cookie per app allows for heterogenous instances |
| is there a limit on number of cookies |
| per netscape and RFC 2109, section 6.3,, its 20 |
| http://wp.netscape.com/newsref/std/cookie_spec.html |
| To work around the cookie limit problem there would be a suggested setting |
| of storing the preferred backup order, or simply require homogeneous instances. |
| |
| 29. Thread pool, shrink dynamically |
| |
| 27. XmlConfigurator - read an XML file to configure the channel. |
| |
| 26. JNDIChannel - a way to bind the group channel in a JNDI tree, |
| so that shared resources can access it. |
| |
| 23. TotalOrderInterceptor - fairly straight forward implementation |
| This interceptor would depend on the fact that there is some sort of |
| membership coordinator, see task 9. |
| Once there is a coordinator in the group, the total order protocol is the same |
| as the OrderInterceptor, except that it gets its message number from |
| the coordinator, prior to sending it out. |
| The TotalOrderInterceptor, will keep a order number per member, |
| this way, ordering is kept intact when different messages are sent |
| two different members, ie |
| Message A - all members - total order (mbrA-2, mbrB-2, mbrC-2, mbrD-2) |
| Message B - mbrC,mbrD only - total order (mbrA-2, mbrB-2, mbrC-3, mbrD-3) |
| - The combination of Member uniqueId,orderId is unique, nothing else |
| this way, if a member crashes, we don't hold the queue, instead we start over. |
| - A TotalOrder token, will contain the coordinator uniqueId as well. |
| - One parameter should be "receive sequence timeout" incase the coordinator is not responding. |
| - OPTION A) |
| the coordinator doesn't forward the message |
| since the app will not receive the proper error message, |
| instead the sequencer just returns the sequence, then the member itself sends the message |
| pros: the app will find out if the send failed/succeeded |
| cons: if the send fails, the sequencer is out of sync for the failed member |
| OPTION B) |
| The coordinator, receives the message, adds on the sequence number |
| then sends the message on behalf of the requesting members |
| pros: sequencer is in charge of the sequence |
| cons: the sequence can become overloaded, since it has to do all the trafficing |
| the requesting member will not know if the message failed/succeeded |
| OPTION C) Research papers on total order, better algorithms exist. |
| NOTES! THIS CANT BE DONE USING A SEQUENCER THAT SENDS THE MESSAGE SINCE WE |
| LOSE THE ABILITY TO REPORT FEEDBACK |
| |
| 21. Implement a WAN membership layer, using a WANMbrInterceptor and a |
| WAN Router/Forwarder (Tipi on top of a ManagedChannel) |
| |
| 20. Implement a TCP membership interceptor, for guaranteed functionality, not just discovery |
| |
| 18. Implement SSL encryption over message transfers, BIO and NIO |
| |
| 8. WaitForCompletionInterceptor - waits for the message to get processed by all receivers before returning |
| (This is useful when synchronized=false and waitForAck=false, to improve |
| parallel processing, but you want to have all messages sent in parallel and |
| don't return until all have been processed on the remote end.) |
| |
| 11. Code a ReplicatedFileSystem example, package org.apache.catalina.tipis |
| |
| 13. StateTransfer interceptor |
| the ideas just come up in my head. the state transfer interceptor |
| will hold all incoming messages until it has received a message |
| with a STATE_TRANSFER header as the first of the bytes. |
| Once it has received state, it will pretty much take itself out of the loop |
| The benefit of the new ParallelNioSender is that it doesn't require to know about |
| a member to transfer state, all it has to do is to reply to a message that came in. |
| State is a one time deal for the entire channel, so a |
| session replication cluster, would transfer state as one block, not one per context |
| |
| 14. Keepalive count and idle kill off for Nio senders |
| |
| 17. Implement transactions - the ability to start a transaction, send several messages, |
| and then commit the transaction |
| |
| Tasks Completed |
| =========================================== |
| 1. True synchronized/asynchronized replication enabled using flags |
| Sender.sendAck/Receiver.waitForAck/Receiver.synchronized |
| Task Desc: waitForAck - should only mean, we received the message, not for the |
| message to get processesed. This should improve throughput, and an interceptor |
| can do waitForCompletion |
| Status: Complete |
| Notes: |
| |
| 2. Unique id, send it in byte array instead of string |
| |
| 3. DataSender or ReplicationTransmitter swallows IOException, this should be |
| Notes: This has only been fixed for the pooled synchronized. the fastasynch |
| aint working that well |
| |
| 4. ChannelMessage.getMessage should return streamable, that way we can wrap, |
| pass it around and all those good things without having to copy byte arrays |
| left and right |
| Notes: Instead of using a streamable, this is implemented using the XByteBuffer, |
| which is very easy to use. It also becomes a single spot for optimizations. |
| Ideally, there would be a pool of XByteBuffers, that all use direct ByteBuffers |
| for its data handling. |
| |
| 5. OrderInterceptor - guarantees the order of messages |
| Notes: completed |
| |
| 6. NIO and IO DataSender, since the IO is blocking |
| Notes: completed. works very well, have not implemented suspect error logging. |
| |
| 7. FragmentationInterceptor - splits up messages that are larger than X bytes. |
| Notes: complated |
| |
| 15. remove DataSenderFactory and DataSender.properties - |
| these cause the settings to be hard coded ant not pluggable. |
| Notes: Completed, now you can initialize a transport class |
| |
| 12. LazyReplicatedHashMap - memory efficient clustered map. |
| This map can be used for PRIMARY/SECONDARY session replication |
| Ahh, the beauty of storing data in remote locations |
| The lazy hash map will only replicate its attribute names to all members in the group |
| with that name, it will also replicate the source (where to get the object) |
| and the backup member where it can find a backup if the source is gone. |
| If the source disappears, the backup node will replicate attributes that |
| are stored to a new primary backups can be chosen on round robin. |
| When a new member arrives and requests state, that member will get all the attribute |
| names and the locations. |
| It can replicate every X seconds, or on dirty flags by the objects stored, |
| or a request to scan for dirty flags, or a request with the objects. |
| Notes: the map has been completed |
| |
| 22. sendAck and synchronized should not have to be a XML config, |
| it can be configured on a per packet basis using ClusterData.getOptions() |
| Notes: see Channel.SEND_OPT_XXXX variables |
| |
| 28. Thread pool should have maxThreads and minThreads and grow dynamically |
| |
| 24. MessageDispatchInterceptor - for asynchronous sending |
| - looks at the options flag SEND_OPTIONS_ASYNCHRONOUS |
| - has two modes |
| a) async parallel send - each message to all destinations before next message |
| b) async per/member - one thread per member using the FastAsyncQueue (good for groups with slow receivers) |
| - Callback error handler - for when messages fail, and the application wishes to become notified |
| - MUST HAVE A LIMIT QUEUE SIZE IN MB, to avoid OOM errors or persist the queue. |
| - MUST USE ClusterData.deepclone() to ensure thread safety if ClusterData objects get recycled |
| Notes: Simple implementation, one thread, invokes all senders in parallel. |
| Deep cloning is configurable as optimization. |
| |
| 37. Interceptor.getOptionFlag() - lets the system configure a flag to be used |
| for the interceptor. that way, all constants don't have to be configured |
| in Channel.SEND_FLAG_XXXX. |
| Also, the GroupChannel will make a conflict check upon startup, |
| so that there is no conflict. I will change options to a long, |
| so that we can have 63 flags, hence up to 60 interceptors. |
| Notes: Completed, remained an int, so 31 flags |
| |
| b) State synchronization for the map - will need to add in MSG_INIT |
| Fixed map bug |
| |
| c) RpcChannel - collect "no reply" replies, so that we don't have to time out |
| The RpcChannel now works together with the group channel, so that when it receives an RPC message |
| and no one accepts it, then it can reply immediately. this way the rpc sender doesn't have to time out. |
| |
| 39. Support for IPv6 |
| Notes: Completed. The membership now carries a variable length host address to support IPv6 |
| |
| 40. channel.stop() - should broadcast a stop message, to avoid timeout |
| Notes: Completed. |
| |
| 42. When the option SEND_OPTIONS_SYNCHRONIZED_ACK, and an error happens during |
| processing on the remote node, a FAIL_ACK command should be sent |
| so that the sending node doesn't have to time out |
| Notes: Completed. The error will be wrapped in a ChannelException and marked as a |
| RemoteProcessException for that node. |
| To receive this exception, one must turn on |
| |
| 43. Silent member, node discovery. |
| Add in the ability to start up tribes, but don't start the membership broadcast |
| component, only the listener |
| Notes: Completed. added in correct startup sequences. |
| |
| 46. Heartbeat Interface, to notify listeners as well |
| Notes: Implemented |
| |
| 44. Soft membership failure detection, ie if a webapp is stopped, but |
| the AbstractReplicatedMap doesn't broadcast a stop message |
| This is one potential solution: |
| 1. keep a static WeakHashMap of all map implementations running |
| so that we can share one heartbeat thread for timeouts |
| 2. everytime a message is received, update the last check time for that |
| member so that we don't need the thread to actively check |
| 3. when the thread wakes up, it will check maps that are outside |
| the valid range for check time, |
| 4. send a RPC message, if no reply, remove the map from itself |
| Other solution, use the TcpFailureDetector, catch send errors |
| Notes: Implemented using a periodic ping in the AbstractReplicatedMap |
| |
| 45. McastServiceImpl.receive should have a SO_TIMEOUT so that we can check |
| for members dropping on the same thread |
| Notes: Completed |
| |
| 33. TcpFailureDetector, when a member is reported missing, first check TCP path too. |
| Notes: Completed |
| |
| 34. Configurable payload for the membership heartbeat, so that the app can decide what to heartbeat. |
| such as JMX management port, ala Andy Piper's suggestion. |
| Notes: Completed |
| |
| 49. Make sure that membership is established before the start process returns. |
| Notes: Completed |
| |
| 16. Guaranteed delivery of messages, ie either all get it or none get it. |
| Meaning, that all receivers get it, then wait for a process command. |
| ala Gossip protocol - this is fairly redundant with a Xa2PhaseCommitInterceptor |
| except it doesn't keep a transaction log. |
| Notes: Completed in another task |
| |
| 10. Xa2PhaseCommitInterceptor - make sure the message doesn't reach the receiver unless all members got it |
| Notes: Completed |
| |
| 25. Member.uniqueId - 16 bytes unique for a member, UUID |
| Needed to not confuse a crashed member with a revived member on the same port |
| Notes: Completed |
| |
| 19. Implement a hardcoded tcp membership |
| Notes: Completed |
| |
| 9. CoordinatorInterceptor - manages the selection of a cluster coordinator |
| just had a brilliant idea, if GroupChannel keeps its own view of members, |
| the coordinator interceptor can hold on to the member added/disappared event |
| It can also intercept down going messages if the coordinator disappeared |
| while a new coordinator is chosen |
| It can also intercept down going messages for members disappeared that the |
| calling app not yet knows about, to avoid a ChannelException |
| The coordinator is needed because of the mixup when two channels startup instantly |
| Notes: Completed. org.apache.catalina.tribes.group.interceptors.NonBlockingCoordinatorInterceptor |
| implements an automerging algorithm. |