-
Task
-
Resolution: Done
-
Major
-
2.4.1 SP1
-
None
FC does not control flow tightly enough. I believe this may be because credit replenishments bring a sender up to max_credits from whatever credit level the sender is at, which has no direct relationship to how many bytes the receiver had seen. This is more of a problem under sustained over load when sender threads are timing out while blocking and sending credit requests. For example, with a config of max_credits = 1,000,000 and min_credits = 100,000. Sender is under enough client load that it basically always has sender threads blocking with messages totalling 1,000,000 bytes.
Sender sends 1,000,000 bytes and blocks.
Receiver processes slowly.
Receiver processes 900,000, sends credit replenishment 1.
Meanwhile, sender times out; sends credit request 1.
Sender gets credit replenishment 1; unblocks, sends another 1,000,000, blocks again.
Receiver processes 100,000, then gets credit request 1 and sends replenishment 2.
Sender times out; sends credit request 2.
Sender gets replenishment 2 and unblocks, sends 1,000,000 bytes.
At this point, the sender has sent 3,000,000 and the receiver has received 1,000,000. Further, there is a credit request in the channel, so after the receiver sees the next 1,000,000 bytes, it will have sent two more replenishments, theoretically allowing the sender to send 2,000,000 more bytes.
This is an extreme example, but the key point is a credit replenishment gives the sender the right to send up to max_credits, even though the receiver may have only seen min_credits when it sent the message.
Solution to this problem is two-fold:
1) Credit replenishment messages should include a payload indicating the number of bytes received. The sender only gives itself that many credits.
2) #1 alone will hurt performance in a steady-state system. If A and B are sending messages to each other, with a config of max_credits = 1,000,000 and min_credits=100,000, B will send A 900,000 credits when it has read that many. It will take a while for the credit replenishment to reach A (since B is also sending), so A will send 100,000 more and begin blocking. It will then get the 900,000 credit replenishment, send 900,000 and begin blocking again. Under the old system it would have gotten 1,000,000 bytes – now it only gets 900,000.
Solution to that is to change the meaning of min_credits. Currently, receiver sends credit when it has received_bytes >= (max_credits - min_credits). If min_credits is 100,000, credits will only be sent when the sender is almost out. The standard value of min_threshold=0.10 is very conservative, but was needed because the more frequently replenishment messages with no set number of bytes get sent, the more likely the sender is to get too many credits and OOME.
If credit replenishment messages only give the number of bytes the receiver has read, then there is no OOME risk. Therefore, sending replenishment messages frequently makes sense. So, I propose the receiver should send credit when received_bytes >= min_credits, rather than the current approach of sending when received_bytes >= max_credit - min_credit.
Bela, we discussed making this change to FC. But #2 above is a pretty significant change in behavior, since it changes the meaning of config parameters. Should I check this in as something else, e.g. FC2?