When a message from P of length LEN is received, the following algorithm is run:
- P's credits are decremented by LEN
- If this is less than min_credits:
- Set P's credits to max_credits
- Deliver the message
- Send a REPLENISH message back to P
The problem here is that, if the deliver of the message blocks (e.g. because another sync RPC is invoked on the same thread), then other threads from P sending messages won't cause a REPLENISH message to be sent back to P.
SOLUTION: replenish P's credits only after the message delivery:
- P's credits are decremented by LEN
- If this is less than min_credits:
- Deliver the message
- Set P's credits to max_credits
- Send a REPLENISH message back to P
Email from Dan:
... when a message is received that passes over the min_credits threshold, the credits for that sender are already adjusted to max_credits - so only this thread can send a REPLENISH message back. If this thread blocks in the application (e.g. it sends a message and it blocks to wait for credits), even though other threads successfully deliver other messages, FlowControl will not send a REPLENISH message.
So it only takes one delayed/dropped REPLENISH message to enter a vicious circle, where nodes keep not sending REPLENISH messages to each other. And credit requests can't really help here, because there are way too many concurrent requests, and credits are immediately exhausted.
...I think the best solution would be to have the receiver increment credits only after the application handles the message. I don't think the receiver needs to do anything before delivering the message, because the blocking happens on the sender side anyway.
- relates to
-
ISPN-6799 OOB thread pool fills with threads trying to send remote get responses
- Closed