Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2953

Transport-based failure detection should not emit suspect events when channel is disconnecting

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 5.4.12, 5.5.2
    • 5.5.1, 5.4.11
    • None

      When using TCP w/enable_suspect_events=true, we noticed that a leaving member will still emit suspect events when other members close their connections to the leaving member.
      e.g.

      {{2025-11-25 09:34:58,990 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 90) node-1: sending LEAVE request to node-2
      2025-11-25 09:34:58,990 TRACE [org.jgroups.protocols.UNICAST3] (ServerService Thread Pool – 90) node-1 --> node-2: DATA(#144, conn_id=0)
      2025-11-25 09:34:58,990 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool – 90) node-1: sending msg to node-2, src=node-1, size=83, hdrs: GMS: GmsHeader[LEAVE_REQ]: mbr=node-1, UNICAST3: DATA, seqno=144, TP: cluster=ejb
      2025-11-25 09:34:59,000 TRACE [org.jgroups.protocols.TCP] (thread-8,ejb,node-1) node-1: received message batch of 1 messages from node-3
      2025-11-25 09:34:59,000 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1 <-- node-3: ACK(#132, conn-id=1, ts=73)
      2025-11-25 09:34:59,003 TRACE [org.jgroups.protocols.TCP] (thread-8,ejb,node-1) node-1: received [node-3 to node-1, 42 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, UNICAST3: DATA, seqno=201, TP: cluster=ejb
      2025-11-25 09:34:59,003 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1 <-- node-3: DATA(#201, conn_id=0)
      2025-11-25 09:34:59,003 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1: delivering node-3#201
      2025-11-25 09:34:59,003 TRACE [org.jgroups.protocols.UFC] (thread-8,ejb,node-1) node-3 used 42 credits, 3958782 remaining
      2025-11-25 09:34:59,005 TRACE [org.jgroups.protocols.TCP] (thread-8,ejb,node-1) node-1: received message batch of 1 messages from node-2
      2025-11-25 09:34:59,005 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1 <-- node-2: ACK(#143, conn-id=0, ts=77)
      2025-11-25 09:34:59,006 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1 --> node-2: ACK(#236)
      2025-11-25 09:34:59,006 TRACE [org.jgroups.protocols.TCP] (thread-8,ejb,node-1) node-1: sending msg to node-2, src=node-1, size=60, hdrs: UNICAST3: ACK, seqno=236, ts=79, TP: cluster=ejb
      2025-11-25 09:34:59,006 TRACE [org.jgroups.protocols.UNICAST3] (thread-8,ejb,node-1) node-1 --> node-3: ACK(#201)
      2025-11-25 09:34:59,006 TRACE [org.jgroups.protocols.TCP] (thread-8,ejb,node-1) node-1: sending msg to node-3, src=node-1, size=60, hdrs: UNICAST3: ACK, seqno=201, ts=80, TP: cluster=ejb
      2025-11-25 09:34:59,009 TRACE [org.jgroups.protocols.TCP] (thread-5,null,node-1) node-1: received [node-2 to node-1, 40 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, UNICAST3: DATA, seqno=237, TP: cluster=ejb
      2025-11-25 09:34:59,009 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1 <-- node-2: DATA(#237, conn_id=0)
      2025-11-25 09:34:59,009 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1: delivering node-2#237
      2025-11-25 09:34:59,009 TRACE [org.jgroups.protocols.UFC] (thread-5,null,node-1) node-2 used 40 credits, 3972975 remaining
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.TCP] (thread-5,null,node-1) node-1: received [node-2 to node-1, 42 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, UNICAST3: DATA, seqno=238, TP: cluster=ejb
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1 <-- node-2: DATA(#238, conn_id=0)
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1: delivering node-2#238
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UFC] (thread-5,null,node-1) node-2 used 42 credits, 3972933 remaining
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.TCP] (thread-5,null,node-1) node-1: received [node-3 to node-1, 40 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, UNICAST3: DATA, seqno=202, TP: cluster=ejb
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1 <-- node-3: DATA(#202, conn_id=0)
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UNICAST3] (thread-5,null,node-1) node-1: delivering node-3#202
      2025-11-25 09:34:59,012 TRACE [org.jgroups.protocols.UFC] (thread-5,null,node-1) node-3 used 40 credits, 3958742 remaining
      2025-11-25 09:34:59,014 TRACE [org.jgroups.protocols.TCP] (Connection.Receiver [0:0:0:0:0:0:0:1:7600 - 0:0:0:0:0:0:0:1:35463]-7,ejb,node-1) 0:0:0:0:0:0:0:1:7600: removed connection to 0:0:0:0:0:0:0:1%0:7700
      2025-11-25 09:34:59,014 DEBUG [org.jgroups.protocols.TCP] (Connection.Receiver [0:0:0:0:0:0:0:1:7600 - 0:0:0:0:0:0:0:1:35463]-7,ejb,node-1) node-1: connection closed by peer node-2 (IP=0:0:0:0:0:0:0:1%0:7700), sending up a suspect event
      2025-11-25 09:34:59,017 TRACE [org.jgroups.protocols.TCP] (thread-5,null,node-1) node-1: received [node-2 to node-1, 0 bytes, flags=OOB|NO_RELIABILITY], headers are GMS: GmsHeader[LEAVE_RSP], TP: cluster=ejb
      2025-11-25 09:34:59,020 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 90) node-1: got LEAVE response from node-2 in 30 ms
      2025-11-25 09:34:59,022 TRACE [org.jgroups.protocols.VERIFY_SUSPECT2] (Connection.Receiver [0:0:0:0:0:0:0:1:7600 - 0:0:0:0:0:0:0:1:35463]-7,ejb,node-1) verifying that [node-2] is dead
      2025-11-25 09:34:59,022 TRACE [org.jgroups.protocols.TCP] (Connection.Receiver [0:0:0:0:0:0:0:1:7600 - 0:0:0:0:0:0:0:1:35463]-7,ejb,node-1) node-1: sending msg to node-2, src=node-1, size=72, hdrs: VERIFY_SUSPECT2: [ARE_YOU_DEAD], TP: cluster=ejb
      2025-11-25 09:34:59,024 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-11,ejb,node-1) 0:0:0:0:0:0:0:1:7600: server is not running, discarding message to 0:0:0:0:0:0:0:1%0:7700
      2025-11-25 09:34:59,025 TRACE [org.jgroups.protocols.UNICAST3] (ServerService Thread Pool – 90) node-1 --> node-2: ACK(#238)
      2025-11-25 09:34:59,025 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool – 90) node-1: sending msg to node-2, src=node-1, size=60, hdrs: UNICAST3: ACK, seqno=238, ts=81, TP: cluster=ejb
      2025-11-25 09:34:59,025 TRACE [org.jgroups.protocols.UNICAST3] (ServerService Thread Pool – 90) node-1 --> node-3: ACK(#202)
      2025-11-25 09:34:59,025 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool – 90) node-1: sending msg to node-3, src=node-1, size=60, hdrs: UNICAST3: ACK, seqno=202, ts=82, TP: cluster=ejb
      2025-11-25 09:34:59,025 DEBUG [org.jgroups.protocols.TCP] (ServerService Thread Pool – 90) node-1: closing sockets and stopping threads
      2025-11-25 09:34:59,026 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-11,ejb,node-1) 0:0:0:0:0:0:0:1:7600: server is not running, discarding message to 0:0:0:0:0:0:0:1%0:7700
      2025-11-25 09:34:59,026 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-11,ejb,node-1) 0:0:0:0:0:0:0:1:7600: server is not running, discarding message to 0:0:0:0:0:0:0:1%0:7800
      2025-11-25 09:34:59,028 INFO [org.jboss.as.clustering.jgroups] (ServerService Thread Pool – 90) WFLYCLJG0035: Disconnected 'ee' channel. 'node-1' left cluster 'ejb'
      }}

              pferraro@redhat.com Paul Ferraro
              pferraro@redhat.com Paul Ferraro
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: