The code in NAKACK/NAKACK2.send() acquires seqno_lock like this:
NAKACK.send()
seqno_lock.lock(); try { try { // incrementing seqno and adding the msg to sent_msgs needs to be atomic msg_id=seqno +1; msg.putHeader(this.id, NakAckHeader.createMessageHeader(msg_id)); win.add(msg_id, msg); seqno=msg_id; } catch(Throwable t) { // throw exception } } finally { seqno_lock.unlock(); }
This slows concurrent sender threads down if add() takes a while. Method add() can take a while if we have many concurrent adds and removes.
The reason we use seqno_lock is to prevent gaps in the sequence numbers, e.g. if we have an exception in add().
SOLUTION:
- Assign a new msg_id by incrementing seqno atomically (seqno_lock is now de-scoped to only increment seqno, might replace this with an AtomicLong anyway)
- In a loop: add the message (calling add()), until add() returns successfully, then break
Example:
msg_id=seqno.incrementAndGet(); // uses an AtomicLong while(running) { // maybe bound with a counter try { msg.adddHeader(...); add(); break; } catch(Throwable t) { // log } }