-
Task
-
Resolution: Done
-
Major
-
None
-
None
In UNICAST.down(), we acquire a lock per sender to which we send a message:
entry.lock(); // threads will only sync if they access the same entry
try
finally
{ entry.unlock(); }the code
entry.sent_msgs.add()
is costly as it adds the message to the hashmap, but also to the retransmitter, which schedules a timer task etc.
The temp solution is to split add(0 into 2 part, which add the message to the hashmap (fast) and to the retransmitter (costly). The costly part is moved outside the lock scope, for example:
entry.lock(); // threads will only sync if they access the same entry
try { seqno=entry.sent_msgs_seqno; send_conn_id=entry.send_conn_id; hdr=new UnicastHeader(UnicastHeader.DATA, seqno, send_conn_id, seqno == DEFAULT_FIRST_SEQNO); msg.putHeader(getName(), hdr); entry.sent_msgs.addToMessages(seqno, msg); // add *including* UnicastHeader, adds to hashmap entry.sent_msgs_seqno++; }
finally { entry.unlock(); }
entry.sent_msgs.addToRetransmitter(seqno, msg); // adds to retransmitter
However, the issie is if the addition to the retransmitter fails (e.g. due to an OOME): then we'd have a message gap on the receiver !
SOLUTION:
#1 Do the add to the retransmitter in a loop. If there's a failure, sleep a bit and try again. Increase the sleep time and so on. Not very nice code, but works and doesn't ever lose a message. OK, if we get OOMEs, then sth's wrong anyway, but this covers temp OOMEs
#2 If there's an issue, set a flag. Next time around, we check the flag. If it is set, we re-add all messages in the hashmap into the retransmitter. Involves locking of the hashmaps and retransmitter, but that's OK since this case should almost never happen anyway !