Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-21180

Failures in variants of *WebFailoverTestCase showing inconsistent topology

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 38.0.0.Final, 38.0.1.Final
    • Clustering, Test Suite
    • None

      There are intermittent failures in the WebFailoverTestCases, across multiple variants and platforms.

      The most concerning problem is that while during the test only 1 node leaves, it often causes 2 nodes to leave concurrently. This of course should never be the case, as we only ever shutdown a single node in the test.

      Example failures:

      https://ci.wildfly.org/buildConfiguration/WF_PullRequest_LinuxSmJdk17/530228
      https://ci.wildfly.org/buildConfiguration/WF_PullRequest_LinuxJdk17/530074

      Example timeline of events:

      node-1 2025-11-19 11:36:40,969 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0008: Undertow HTTPS listener https suspending
      node-1 2025-11-19 11:36:40,969 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0008: Undertow AJP listener ajp suspending
      node-1 2025-11-19 11:36:40,970 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-7) WFLYSRV0028: Stopped deployment FineWebFailoverTestCase.war (runtime-name: FineWebFailoverTestCase.war) in 61ms
      node-1 2025-11-19 11:36:40,970 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0008: Undertow HTTP listener default suspending
      node-1 2025-11-19 11:36:40,971 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 88) [Context=http-remoting-connector] ISPN100008: Updating cache members list [node-2, node-3], topology id 10
      node-1 2025-11-19 11:36:40,972 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 88) [Context=http-remoting-connector] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-1 2025-11-19 11:36:40,973 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 88) WFLYCLINF0003: Stopped http-remoting-connector cache from ejb container
      node-1 2025-11-19 11:36:40,974 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0007: Undertow HTTPS listener https stopped, was bound to [::1]:8443
      node-1 2025-11-19 11:36:40,975 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0007: Undertow AJP listener ajp stopped, was bound to [::1]:8009
      node-1 2025-11-19 11:36:40,978 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to [::1]:8080
      node-2 2025-11-19 11:36:40,981 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-2) [Context=http-remoting-connector] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-1 2025-11-19 11:36:40,982 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 88) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2, node-3], topology id 10
      node-3 2025-11-19 11:36:40,984 INFO  [org.infinispan.LIFECYCLE] (thread-13,ejb,node-3) [Context=default-server] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-3 2025-11-19 11:36:40,984 INFO  [org.infinispan.LIFECYCLE] (thread-4,null,node-3) [Context=http-remoting-connector] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-3 2025-11-19 11:36:40,985 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p46-t1) [Context=default-server] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-3 2025-11-19 11:36:40,986 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p50-t7) [Context=http-remoting-connector] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-2 2025-11-19 11:36:40,987 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p30-t2) [Context=http-remoting-connector] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-3 2025-11-19 11:36:40,991 INFO  [org.infinispan.LIFECYCLE] (thread-4,null,node-3) ISPN000973: Task 'state-transfer-FineWebFailoverTestCase.war' started at 2025-11-19T11:36:40.968707554Z and done 2025-11-19T11:36:40.991771723Z
      node-2 2025-11-19 11:36:40,993 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-2) ISPN000973: Task 'state-transfer-FineWebFailoverTestCase.war' started at 2025-11-19T11:36:40.969998842Z and done 2025-11-19T11:36:40.993531079Z
      node-1 2025-11-19 11:36:41,003 INFO  [org.infinispan.CLUSTER] (thread-9,ejb,node-1) [Context=default-server] ISPN100009: Advancing to rebalance phase READ_ALL_WRITE_ALL, topology id 12
      node-1 2025-11-19 11:36:41,003 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0004: Undertow 2.3.20.Final stopping
      node-1 2025-11-19 11:36:41,010 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 88) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-1 2025-11-19 11:36:41,012 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 28) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2, node-3], topology id 10
      node-1 2025-11-19 11:36:41,013 INFO  [org.infinispan.CONTAINER] (ServerService Thread Pool -- 88) ISPN000390: Persisted state, version=15.2.6.Final timestamp=2025-11-19T11:36:41.012702640Z
      node-1 2025-11-19 11:36:41,013 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 28) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-1 2025-11-19 11:36:41,014 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 88) Stopped ejb cache container
      node-1 2025-11-19 11:36:41,016 INFO  [org.infinispan.CLUSTER] (thread-11,ejb,node-1) [Context=FineWebFailoverTestCase.war] ISPN100009: Advancing to rebalance phase READ_ALL_WRITE_ALL, topology id 12
      node-3 2025-11-19 11:36:41,018 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-3) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-3 2025-11-19 11:36:41,019 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-3) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-3 2025-11-19 11:36:41,019 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p46-t7) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-1 2025-11-19 11:36:41,019 INFO  [org.infinispan.CLUSTER] (thread-8,ejb,node-1) [Context=default-server] ISPN100009: Advancing to rebalance phase READ_NEW_WRITE_ALL, topology id 13
      node-2 2025-11-19 11:36:41,021 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-2) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-2 2025-11-19 11:36:41,022 INFO  [org.infinispan.LIFECYCLE] (thread-12,ejb,node-2) [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [node-2, node-3], phase READ_OLD_WRITE_ALL, topology id 11
      node-2 2025-11-19 11:36:41,023 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p30-t5) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-3 2025-11-19 11:36:41,021 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p50-t2) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-2 2025-11-19 11:36:41,026 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p34-t7) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [node-2, node-3], topology id 11
      node-1 2025-11-19 11:36:41,027 INFO  [org.infinispan.CONTAINER] (ServerService Thread Pool -- 28) ISPN000390: Persisted state, version=15.2.6.Final timestamp=2025-11-19T11:36:41.026762944Z
      node-1 2025-11-19 11:36:41,028 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 28) Stopped web cache container
      node-1 2025-11-19 11:36:41,046 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 28) WFLYCLJG0034: Disconnecting 'ee' channel. 'node-1' leaving cluster 'ejb' with view: [node-1|2] (3) [node-1, node-2, node-3]
      node-2 2025-11-19 11:36:41,051 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,051 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,056 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,056 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,056 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,056 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,057 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,058 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-1 left the cluster
      node-2 2025-11-19 11:36:41,058 INFO  [org.infinispan.CLUSTER] (thread-4,null,node-2) ISPN100001: Node node-3 left the cluster
      node-2 2025-11-19 11:36:41,072 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p30-t2) [Context=http-remoting-connector] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 12
      node-2 2025-11-19 11:36:41,072 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p30-t1) [Context=org.infinispan.CONFIG] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 12
      node-2 2025-11-19 11:36:41,072 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p26-t8) [Context=org.infinispan.CONFIG] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 12
      node-2 2025-11-19 11:36:41,072 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p22-t7) [Context=org.infinispan.CONFIG] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 12
      node-2 2025-11-19 11:36:41,079 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p26-t8) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2], topology id 13
      node-2 2025-11-19 11:36:41,079 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p30-t2) [Context=http-remoting-connector] ISPN100008: Updating cache members list [node-2], topology id 13
      node-2 2025-11-19 11:36:41,080 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t1) [Context=org.infinispan.CONFIG] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 12
      node-2 2025-11-19 11:36:41,084 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t1) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2], topology id 13
      node-2 2025-11-19 11:36:41,084 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t2) [Context=default-server] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 14
      node-2 2025-11-19 11:36:41,085 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p30-t1) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2], topology id 13
      node-2 2025-11-19 11:36:41,087 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t1) [Context=FineWebFailoverTestCase.war] ISPN100007: After merge (or coordinator change), recovered members [node-2, node-3] with topology id 13
      node-2 2025-11-19 11:36:41,088 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t1) [Context=FineWebFailoverTestCase.war] ISPN100008: Updating cache members list [node-2], topology id 14
      node-2 2025-11-19 11:36:41,089 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p22-t7) [Context=org.infinispan.CONFIG] ISPN100008: Updating cache members list [node-2], topology id 13
      node-2 2025-11-19 11:36:41,089 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p34-t2) [Context=default-server] ISPN100008: Updating cache members list [node-2], topology id 15
      node-1 2025-11-19 11:36:41,104 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 28) WFLYCLJG0035: Disconnected 'ee' channel. 'node-1' left cluster 'ejb'
      node-1 2025-11-19 11:36:41,121 INFO  [org.jboss.as] (MSC service thread 1-6) WFLYSRV0050: WildFly 39.0.0.Beta1-SNAPSHOT (WildFly Core 31.0.0.Beta2) stopped in 213ms
      node-2 2025-11-19 11:36:41,521 INFO  [org.jboss.as.test.clustering.TopologyChangeListenerBean] (default task-3) [node-2, node-3] != [node-2], waiting for a topology change event. Current topology id = 14
      node-2 2025-11-19 11:36:56,521 INFO  [org.jboss.as.test.clustering.TopologyChangeListenerBean] (default task-3) [node-2, node-3] != [node-2], waiting for a topology change event. Current topology id = 14
      node-2 2025-11-19 11:36:56,521 INFO  [org.jboss.as.test.clustering.TopologyChangeListenerBean] (default task-3) [node-2, node-3] != [node-2], waiting for a topology change event. Current topology id = 14
      node-2 2025-11-19 11:36:56,521 INFO  [org.jboss.as.test.clustering.TopologyChangeListenerBean] (default task-3) [node-2, node-3] != [node-2], waiting for a topology change event. Current topology id = 14
      node-2 2025-11-19 11:36:56,521 INFO  [org.jboss.as.test.clustering.TopologyChangeListenerBean] (default task-3) [node-2, node-3] != [node-2], waiting for a topology change event. Current topology id = 14
      node-2 2025-11-19 11:36:56,528 ERROR [io.undertow.request] (default task-3) UT005023: Exception handling request to /FineWebFailoverTestCase/membership: jakarta.servlet.ServletException: java.util.concurrent.TimeoutException: Cache web/FineWebFailoverTestCase.war failed to establish topology [node-2, node-3] within PT15S. Current view is: [node-2]
      

              rhn-engineering-rhusar Radoslav Husar
              rhn-engineering-rhusar Radoslav Husar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: