-
Sub-task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
None
-
False
-
-
Goal: remove NAKACK2 from the stack when the transport is TCP
Digests
When NAKACK2 and STABLE are removed from the stack, we cannot use digests:
- GMS must not perform sanity checking, to see if digests are available. Instead, the GET_DIGEST event sent down should simply return null when STABLE is absent
- In this case, the JOIN-RSP sent back to the joiner contains a null digest: no SET_DIGEST event is therefore sent down at the joiner
- Messages received by any member are delivered in the order in which they were received by the transport. Because the receiver's thread pool can destroy this ordering, the following must be done:
- The thread pool cannot discard a message when full: use CallerRunsPolicy
- Either disable the thread pool, so messages are delivered in reception-order, or
- Make sure that the default processing policies are present: regular messages are added to a queue (per sender) in the MaxOneThreadPerSender processing policy -> they're therefore delivered in reception order, OOB messages are delivered out-of-order, that's fine, too. Also make sure that the queue in MaxOneThreadPerSender is unbounded (max_buffer_size=0), or else messages would get discarded when that queue is full
Ordering of group messages (=messages to all)
When 3 threads (t1, t2, t3) send messages concurrently, then the following happens:
NAKACK2 is present
- NAKACK2 establishes an order, depending on which thread hits it first, say: t2->1, t3->2, t1->3
- Regardless of the order in which the threads hit the transport, and in which order the receiver sends the 3 messages up the stack, NAKACK2 will deliver message 1, followed by message 2, followed by 3.
- This is done at all members
NAKACK2 is absent
- Now the 3 threads at A hit the transport and send the messages to (say members B and C):
- t2 sends the message to B, then context-switches
- t3 sends the message to B, then C
- t1 sends the message to B, C
- t2 comes back from the context switch and sends the message to C
- The delivery order for the 3 messages will be
- At B: t2 -> t3 -> t1 (t2 means t2's message)
- At C: t3 -> t1 -> t2
- If there was a lock around sending a message to all members, then the above order would be correct. Perhaps this can be enabled/disabled? The cost of a fat lock might be too high...
So while the TCP transport delivers the 3 messages in a lossless and no-duplicate manner, the order may not be the same at all receivers. The reason is that we don't have the abstraction of A sending to [B,C], but of A sending to B and A sending to C.
MessageProcessingPolicy
The MaxOneThreadPerSender policy adds regular messages to a queue. If the queue is bounded (max_queue_size > 0), then messages will be discarded if the queue is full. This is bad, as it would cause message loss (no retransmission). However, if the queue was unbounded, it could grow indefinitely if the receiver thread was stuck (e.g. in the application).
Perhaps it is best to create a new MessageProcessingPolicy which
- passes regular messages up on the same (receiver) thread. If stuck in application code, this causes messages not to be read from the TCP/IP socket, causing the sender to be throttled (0 TCP send-window). This is better IMO than an unbounded queue.
- passes OOB messages up to the thread pool; they can be delivered out-of-order
Issues
Merging
Merging probably cannot be done without digests. This might be a showstopper for removing of NAKACK2! Investigate...
State transfer
State transfer cannot be done without digests.
MFC
Perhaps we can at least remove MFC (multicast flow control)?...
Conclusion
Removal of NAKACK2 changes the properties from FIFO-ordered-per-sender/lossless/no-duplicates to lossless/no-duplicates (unless we have the fat lock around sending to all individual members). This is similar to sending OOB messages when NAKACK2 is present.