Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1755

TP: dropping message to wrong destination in a shared transport

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor Minor
    • 3.4.2, 3.5
    • None
    • None
    • Hide
      • Start first instance: jt bla3 A
      • Start second instance: jt bla3 B
      • Make sure we have 2 clusters A1,B1 and A2,B2
      • Press 2 on the second instance
      • Result:
        4692 [WARN] TP$ProtocolAdapter: JGRP000031: B2: dropping unicast message from A2 to wrong destination B2,
        headers are: GMS: GmsHeader[LEAVE_RSP], UNICAST3: DATA, seqno=2, conn_id=1, UDP: [cluster_name=cluster-b]
        
      Show
      Start first instance: jt bla3 A Start second instance: jt bla3 B Make sure we have 2 clusters A1,B1 and A2,B2 Press 2 on the second instance Result: 4692 [WARN] TP$ProtocolAdapter: JGRP000031: B2: dropping unicast message from A2 to wrong destination B2, headers are: GMS: GmsHeader[LEAVE_RSP], UNICAST3: DATA, seqno=2, conn_id=1, UDP: [cluster_name=cluster-b]

      TP has a check whether the dest of incoming unicast messages matches its local address (or the local address in the ProtocolAdapter if a shared transport is used). Messages whose dest != local address are dropped.
      However, in the following scenario, we can have spurious warnings for 2 minutes (by default):

      • 2 processes with 2 members on a shared transport each (see attached shared.xml and bla3.java)
        • Start 2 bla2 instances on the same box
      • On the second instance, a member leaves and immediately rejoins (press [2] on the second member)
      • For some reason, the leaving member didn't receive the LEAVE_RSP unicast message from the coordinator (first member)
      • The newly joined member now receives the LEAVE_RSP unicast from the coordinator, but now it is a different member and therefore has a different local address, so we see the warning
      • This will continue for 2 minutes, until the connection to the unknown member is closed by the coordinator (configurable)

      TODOs:

      • Investigate why the LEAVE_RSP unicast is not received by the leaving member
      • In the warning, add the sender's address and print the headers of the message so it's easier to find the culprit

        1. shared.xml
          3 kB
        2. bla3.java
          2 kB

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: