| To Do: |
| =========================================== |
| |
| Documentation: |
| =========================================== |
| Why is tribes unique compared to JGroups/Appia and other group comm protocols |
| |
| 1. Uses NIO and TCP for guaranteed delivery and the ability to join large groups |
| |
| 2. Guarantees messages the following way |
| a) TCP messaging, with a following READ for NIO to ensure non broken channel |
| b) ACK messages from the receiver |
| c) ACK after processing |
| |
| 3. Same (single) channel can handle all types of guarantees (a,b,c) at the same time |
| and both process synchronous and asynchronous messaging. |
| This is key to support different requirements for messaging through |
| the same channel to save system resources. |
| |
| 4. For async messaging, errors are reported through an error handler, callback |
| |
| 5. Ability to send on multiple streams at the same time, in parallel, to improve performance |
| |
| 6. Designed with replication in mind, that some pieces of data don't need to be totally ordered. |
| |
| 7. Its not built with the uniform group model in mind, but it can be accomplished using interceptors. |
| |
| 8. Future version will have WAN membership and replication |
| |
| Bugs: |
| =========================================== |
| a) Somehow the first NIO connection made, always closes down, why |
| c) RpcChannel - collect "no reply" replies, so that we don't have to time out |
| |
| Code Tasks: |
| =========================================== |
| 38. Make the AbstractReplicatedMap accept non serializable elements, but just don't replicate them |
| |
| 36. UDP Sender and Receiver, initially without flow control and guaranteed delivery. |
| This can be easily done as an interceptor, and operate in parallel with the TCP sender. |
| It can implement an auto detect, so that if the message is going to a destination in the same network |
| patch, then send it over UDP. |
| |
| 35. The ability to share one channel amongst multiple processes |
| |
| 34. Configurable payload for the membership heartbeat, so that the app can decide what to heartbeat. |
| such as JMX management port, ala Andy Piper's suggestion. |
| |
| 33. PerfectFDInterceptor, when a member is reported missing, first check TCP path too. |
| |
| 32. Replicated JNDI entries in Tomcat in the format |
| cluster:<map name>/<entry key> for example |
| cluster:myapps/db/shared/dbinfo |
| |
| 31. A layer on top of the GroupChannel, to allow multiple processes share |
| a channel to send/receive message to conserve system resources - this is way in the future. |
| |
| 30. CookieBasedReplicationMap - a very simple extension to the LazyReplicatedMap |
| but instead of randomly selecting a backup node and then publishing the PROXY to all |
| the other nodes in the group, this will simply |
| read/write a cookie, for a backup location, so the nodes will |
| never know until the request comes in. |
| This is useful in extremely large clusters, and essentially reduces |
| very much of the network chatter, this task is dependent on task25 |
| Question to be answered: How does the map know of the cookie? |
| Potential answer: Use a thread local and bind the request/response to it |
| |
| 29. Thread pool, shrink dynamically |
| |
| 27. XmlConfigurator - read an XML file to configure the channel. |
| |
| 26. JNDIChannel - a way to bind the group channel in a JNDI tree, |
| so that shared resources can access it. |
| |
| 25. Member.uniqueId - 16 bytes unique for a member, UUID |
| Needed to not confuse a crashed member with a revived member on the same port |
| |
| 23. TotalOrderInterceptor - fairly straight forward implementation |
| This interceptor would depend on the fact that there is some sort of |
| membership coordinator, see task 9. |
| Once there is a coordinator in the group, the total order protocol is the same |
| as the OrderInterceptor, except that it gets its message number from |
| the coordinator, prior to sending it out. |
| The TotalOrderInterceptor, will keep a order number per member, |
| this way, ordering is kept intact when different messages are sent |
| two different members, ie |
| Message A - all members - total order (mbrA-2, mbrB-2, mbrC-2, mbrD-2) |
| Message B - mbrC,mbrD only - total order (mbrA-2, mbrB-2, mbrC-3, mbrD-3) |
| - The combination of Member uniqueId,orderId is unique, nothing else |
| this way, if a member crashes, we don't hold the queue, instead we start over. |
| - A TotalOrder token, will contain the coordinator uniqueId as well. |
| - One parameter should be "receive sequence timeout" incase the coordinator is not responding. |
| - OPTION A) |
| the coordinator doesn't forward the message |
| since the app will not receive the proper error message, |
| instead the sequencer just returns the sequence, then the member itself sends the message |
| pros: the app will find out if the send failed/succeeded |
| cons: if the send fails, the sequencer is out of sync for the failed member |
| OPTION B) |
| The coordinator, receives the message, adds on the sequence number |
| then sends the message on behalf of the requesting members |
| pros: sequencer is in charge of the sequence |
| cons: the sequence can become overloaded, since it has to do all the trafficing |
| the requesting member will not know if the message failed/succeeded |
| OPTION C) Research papers on total order, better algorithms exist. |
| |
| 21. Implement a WAN membership layer, using a WANMbrInterceptor and a |
| WAN Router/Forwarder (Tipi on top of a ManagedChannel) |
| |
| 20. Implement a TCP membership interceptor, for guaranteed functionality, not just discovery |
| |
| 19. Implement a hardcoded tcp membership |
| |
| 18. Implement SSL encryption over message transfers, BIO and NIO |
| |
| 8. WaitForCompletionInterceptor - waits for the message to get processed by all receivers before returning |
| (This is useful when synchronized=false and waitForAck=false, to improve |
| parallel processing, but you want to have all messages sent in parallel and |
| don't return until all have been processed on the remote end.) |
| |
| 9. CoordinatorInterceptor - manages the selection of a cluster coordinator |
| just had a brilliant idea, if GroupChannel keeps its own view of members, |
| the coordinator interceptor can hold on to the member added/disappared event |
| It can also intercept down going messages if the coordinator disappeared |
| while a new coordinator is chosen |
| It can also intercept down going messages for members disappeared that the |
| calling app not yet knows about, to avoid a ChannelException |
| |
| |
| 10. Xa2PhaseCommitInterceptor - make sure the message doesn't reach the receiver unless all members got it |
| |
| 11. Code a ReplicatedFileSystem example, package org.apache.catalina.tipis |
| |
| 13. StateTransfer interceptor |
| the ideas just come up in my head. the state transfer interceptor |
| will hold all incoming messages until it has received a message |
| with a STATE_TRANSFER header as the first of the bytes. |
| Once it has received state, it will pretty much take itself out of the loop |
| The benefit of the new ParallelNioSender is that it doesn't require to know about |
| a member to transfer state, all it has to do is to reply to a message that came in. |
| State is a one time deal for the entire channel, so a |
| session replication cluster, would transfer state as one block, not one per context |
| |
| 14. Keepalive count and idle kill off for Nio senders |
| |
| 16. Guaranteed delivery of messages, ie either all get it or none get it. |
| Meaning, that all receivers get it, then wait for a process command. |
| ala Gossip protocol - this is fairly redundant with a Xa2PhaseCommitInterceptor |
| except it doesn't keep a transaction log. |
| |
| 17. Implement transactions - the ability to start a transaction, send several messages, |
| and then commit the transaction |
| |
| Tasks Completed |
| =========================================== |
| 1. True synchronized/asynchronized replication enabled using flags |
| Sender.sendAck/Receiver.waitForAck/Receiver.synchronized |
| Task Desc: waitForAck - should only mean, we received the message, not for the |
| message to get processesed. This should improve throughput, and an interceptor |
| can do waitForCompletion |
| Status: Complete |
| Notes: |
| |
| 2. Unique id, send it in byte array instead of string |
| |
| 3. DataSender or ReplicationTransmitter swallows IOException, this should be |
| Notes: This has only been fixed for the pooled synchronized. the fastasynch |
| aint working that well |
| |
| 4. ChannelMessage.getMessage should return streamable, that way we can wrap, |
| pass it around and all those good things without having to copy byte arrays |
| left and right |
| Notes: Instead of using a streamable, this is implemented using the XByteBuffer, |
| which is very easy to use. It also becomes a single spot for optimizations. |
| Ideally, there would be a pool of XByteBuffers, that all use direct ByteBuffers |
| for its data handling. |
| |
| 5. OrderInterceptor - guarantees the order of messages |
| Notes: completed |
| |
| 6. NIO and IO DataSender, since the IO is blocking |
| Notes: completed. works very well, have not implemented suspect error logging. |
| |
| 7. FragmentationInterceptor - splits up messages that are larger than X bytes. |
| Notes: complated |
| |
| 15. remove DataSenderFactory and DataSender.properties - |
| these cause the settings to be hard coded ant not pluggable. |
| Notes: Completed, now you can initialize a transport class |
| |
| 12. LazyReplicatedHashMap - memory efficient clustered map. |
| This map can be used for PRIMARY/SECONDARY session replication |
| Ahh, the beauty of storing data in remote locations |
| The lazy hash map will only replicate its attribute names to all members in the group |
| with that name, it will also replicate the source (where to get the object) |
| and the backup member where it can find a backup if the source is gone. |
| If the source disappears, the backup node will replicate attributes that |
| are stored to a new primary backups can be chosen on round robin. |
| When a new member arrives and requests state, that member will get all the attribute |
| names and the locations. |
| It can replicate every X seconds, or on dirty flags by the objects stored, |
| or a request to scan for dirty flags, or a request with the objects. |
| Notes: the map has been completed |
| |
| 22. sendAck and synchronized should not have to be a XML config, |
| it can be configured on a per packet basis using ClusterData.getOptions() |
| Notes: see Channel.SEND_OPT_XXXX variables |
| |
| 28. Thread pool should have maxThreads and minThreads and grow dynamically |
| |
| 24. MessageDispatchInterceptor - for asynchronous sending |
| - looks at the options flag SEND_OPTIONS_ASYNCHRONOUS |
| - has two modes |
| a) async parallel send - each message to all destinations before next message |
| b) async per/member - one thread per member using the FastAsyncQueue (good for groups with slow receivers) |
| - Callback error handler - for when messages fail, and the application wishes to become notified |
| - MUST HAVE A LIMIT QUEUE SIZE IN MB, to avoid OOM errors or persist the queue. |
| - MUST USE ClusterData.deepclone() to ensure thread safety if ClusterData objects get recycled |
| Notes: Simple implementation, one thread, invokes all senders in parallel. |
| Deep cloning is configurable as optimization. |
| |
| 37. Interceptor.getOptionFlag() - lets the system configure a flag to be used |
| for the interceptor. that way, all constants don't have to be configured |
| in Channel.SEND_FLAG_XXXX. |
| Also, the GroupChannel will make a conflict check upon startup, |
| so that there is no conflict. I will change options to a long, |
| so that we can have 63 flags, hence up to 60 interceptors. |
| Notes: Completed, remained an int, so 31 flags |
| |
| b) State synchronization for the map - will need to add in MSG_INIT |
| Fixed map bug |