Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2960

PerDestinationBundler race condition prevents node from leaving cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 5.5.3
    • 5.5.1
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      I'm seeing the following with 5.5.1 where default changed to PerDestinationBundler:

      Cluster is AAA, BBB.

      AAA leaves, sends leave request to BBB:

      2025-12-05 14:03:13.460+0000 thread="SpringApplicationShutdownHook" level="TRACE" logger="o.jgroups.protocols.pbcast.GMS" method="trace" msg="AAA: sending LEAVE request to BBB"
      2025-12-05 14:03:13.461+0000 thread="jgroups-28,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="AAA: sending msg to BBB, src=AAA, size=87, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"

      BBB receives and handles leave request from AAA:

      2025-12-05 14:03:13.462+0000 thread="jgroups-21,jgroups,BBB" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="BBB: received [AAA to BBB, 0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"
      2025-12-05 14:03:13.463+0000 thread="jgroups-21,jgroups,BBB" level="DEBUG" logger="o.jgroups.protocols.pbcast.GMS" method="debug" msg="BBB: members are (2) AAA,BBB, coord=BBB: I'm the new coordinator"
      2025-12-05 14:03:13.465+0000 thread="jgroups-21,jgroups,BBB" level="TRACE" logger="o.jgroups.protocols.pbcast.GMS" method="trace" msg="BBB: handleMembershipChange([LEAVE(AAA)])"
      2025-12-05 14:03:13.466+0000 thread="jgroups-21,jgroups,BBB" level="TRACE" logger="o.jgroups.protocols.pbcast.GMS" method="trace" msg="BBB: joiners=[], suspected=[], leaving=[AAA], new view: [BBB|2] (1) [BBB]"
      2025-12-05 14:03:13.466+0000 thread="jgroups-21,jgroups,BBB" level="TRACE" logger="o.jgroups.protocols.pbcast.GMS" method="trace" msg="BBB: sending LEAVE response to AAA"
      2025-12-05 14:03:13.466+0000 thread="jgroups-21,jgroups,BBB" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="BBB: sending msg to AAA, src=BBB, size=61, hdrs: GMS: GmsHeader[LEAVE_RSP], TP: cluster=jgroups"
      2025-12-05 14:03:13.467+0000 thread="jgroups-21,jgroups,BBB" level="DEBUG" logger="o.jgroups.protocols.pbcast.GMS" method="debug" msg="BBB: installing view [BBB|2] (1) [BBB] (AAA left)"

      But leave response is never actually sent from BBB, and the TCP connection is closed:

      2025-12-05 14:03:13.469+0000 thread="NioServer.Selector [/10.244.184.169:7800]-1,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="10.244.184.169:7800: removed connection to 10.244.34.114:7800"

      AAA never receives leave response so leave request is resent:

      2025-12-05 14:03:15.421+0000 thread="Timer runner-2,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="AAA: sending msg to BBB, src=AAA, size=87, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"
      2025-12-05 14:03:16.422+0000 thread="Timer runner-2,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="AAA: sending msg to BBB, src=AAA, size=87, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"
      2025-12-05 14:03:17.422+0000 thread="Timer runner-2,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="AAA: sending msg to BBB, src=AAA, size=87, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"
      2025-12-05 14:03:18.423+0000 thread="Timer runner-2,jgroups,AAA" level="TRACE" logger="org.jgroups.protocols.TCP_NIO2" method="trace" msg="AAA: sending msg to BBB, src=AAA, size=87, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=AAA, UNICAST4: DATA, seqno=2, conn_id=0, TP: cluster=jgroups"

       

      The issue does not happen with TransferQueueBundler.

       

      It looks like the view change cause the send buffer in PerDestinationBundler to be dropped?

      https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/PerDestinationBundler.java#L148

       

              rhn-engineering-bban Bela Ban
              cfredri4 Christian Fredriksson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: