Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1362

NAKACK: second line of defense for requested retransmissions that are not found

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • 3.6.8
    • None
    • None

      When the original sender B is asked by A to retransmit message M, but doesn't have M in its retransmission table anymore, it should tell A, or else A will send retransmission requests to B until A or B leave.

      This problem should have been fixed by JGRP-1251, but if it turns out it wasn't, then this JIRA is (1) a second line of defense to stop the endless retransmission requests and (2) will give us valuable diagnostic information to fix the underlying problem (should there still be one).

      Problem:

      • A has a NakReceiverWindow (NRW) of 50 (highest_delivered seqno) for B
      • B's NRW, however, is 200. B garbage collected messages up to 150.
      • When B sends message 201, A will ask B for retransmission of [51-200]
      • B will retransmit messages [150-200], but it cannot send messages 51-149, as it doesn't have them anymore !
      • A will add messages [150-200], but its NRW is still 50 (highest_delivered)
      • A will continue asking B for messages [51-149] (it does have [150-201])
      • This will go on forever, or until B or A leaves

      SOLUTION:

      • When the original sender B of message M receives a retransmission request for M (from A), and it doesn't have M in its retransmission table, it should send back a MSG_NOT_FOUND message to A including B's digest
      • When A receives the MSG_NOT_FOUND message, it does the following:
      • It logs it own NRW for B
      • It logs B's digest
      • It logs its digest history
        (This information is valuable for investigating the underlying issue)
      • Then A's NRW for B is adjusted:
      • The highest_delivered seqno is set to B.digest.highest_delivered
      • All messages in xmit_table below B.digest.highest_delivered are removed
      • All retransmission tasks in the retransmitter <= B.digest.highest_delivered are cancelled and removed
        (This will stop the retransmission)

      Again, this is a second line of defense, which should never be used. If the underlying problem does occur, however, we'll have valuable information in the logs to diagnose what went wrong.

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: