Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1134

UNICAST.down(): move add to retransmitter out of the lock scope

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • 2.10
    • None
    • None

      In UNICAST.down(), we acquire a lock per sender to which we send a message:

      entry.lock(); // threads will only sync if they access the same entry
      try

      { seqno=entry.sent_msgs_seqno; send_conn_id=entry.send_conn_id; hdr=new UnicastHeader(UnicastHeader.DATA, seqno, send_conn_id, seqno == DEFAULT_FIRST_SEQNO); msg.putHeader(getName(), hdr); entry.sent_msgs.add(seqno, msg); // add *including* UnicastHeader, adds to retransmitter entry.sent_msgs_seqno++; }

      finally

      { entry.unlock(); }

      the code

      entry.sent_msgs.add()

      is costly as it adds the message to the hashmap, but also to the retransmitter, which schedules a timer task etc.

      The temp solution is to split add(0 into 2 part, which add the message to the hashmap (fast) and to the retransmitter (costly). The costly part is moved outside the lock scope, for example:

      entry.lock(); // threads will only sync if they access the same entry
      try { seqno=entry.sent_msgs_seqno; send_conn_id=entry.send_conn_id; hdr=new UnicastHeader(UnicastHeader.DATA, seqno, send_conn_id, seqno == DEFAULT_FIRST_SEQNO); msg.putHeader(getName(), hdr); entry.sent_msgs.addToMessages(seqno, msg); // add *including* UnicastHeader, adds to hashmap entry.sent_msgs_seqno++; }
      finally { entry.unlock(); }

      entry.sent_msgs.addToRetransmitter(seqno, msg); // adds to retransmitter

      However, the issie is if the addition to the retransmitter fails (e.g. due to an OOME): then we'd have a message gap on the receiver !

      SOLUTION:
      #1 Do the add to the retransmitter in a loop. If there's a failure, sleep a bit and try again. Increase the sleep time and so on. Not very nice code, but works and doesn't ever lose a message. OK, if we get OOMEs, then sth's wrong anyway, but this covers temp OOMEs

      #2 If there's an issue, set a flag. Next time around, we check the flag. If it is set, we re-add all messages in the hashmap into the retransmitter. Involves locking of the hashmaps and retransmitter, but that's OK since this case should almost never happen anyway !

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: