-
Bug
-
Resolution: Done
-
Critical
-
2.4.1 SP1, 2.5
-
None
If an application has a set of threads that are trying to send more messages than the consumers can handle, request threads block in FC or SFC waiting for credits. If they wait too long, they wake up and send a message requesting credit. With the tests I'm running, large numbers of threads would block, and then one after another time out, wake up and ask for credit, allow within a very short period. Basically spamming the cluster asking for credit (and getting credit back for each request).
That seems inefficient, but with SFC it was leading to error conditions. Seems some other server's NAKACK requested retransmission of a set of messages, some or all of which were a large number of these "spam" credit requests. The credit requests basically had no message body, just headers, so the NAKACK.max_xmit_size check wasn't assigning them any weight. Effect was the resulting retransmission message was > 64K and couldn't be sent, resulting in an ERROR in UDP.
Possible solution is to add a min_credit_request_interval such that a thread that wakes up from blocking will not request credit it another thread has already done so within the configured time. For reference, ,my port of SFC to Branch_4_2 has an implementation of that concept.
1.
|
Fix credit request storms in HEAD | Resolved | Bela Ban |