Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-15404

Server in cluster hangs during start after previous kill

XMLWordPrintable

    • Hide

      Configure non-blocking state transfer via: <state-transfer timeout="0"/>

      Note:
      Workaround does not fix case when MDB is deployed on restarted server. Infinispan subsystem fails to start and MDB does not activate. Currently there is no workaround.

      Show
      Configure non-blocking state transfer via: <state-transfer timeout="0"/> Note: Workaround does not fix case when MDB is deployed on restarted server. Infinispan subsystem fails to start and MDB does not activate. Currently there is no workaround.
    • Hide

      Steps to reproduce - issue is intermittent, not 100% reproducer:

      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout master
      
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/eap-7.x-messaging-testing-prepare/123/artifact/jboss-eap.zip to jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      mvn -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1531909463-SNAPSHOT clean install -B -Dtest=ClusterSmokeTest#clusterKilTest | tee log
      
      or for issue with MDB not activated after kill/start:
      mvn -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1531909463-SNAPSHOT clean install -B -Dtest=Lodh2TestCase#testRemoteJcaInboundOnly | tee log
      
      
      
      
      Show
      Steps to reproduce - issue is intermittent , not 100% reproducer: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout master groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/eap-7.x-messaging-testing-prepare/123/artifact/jboss-eap.zip to jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1531909463-SNAPSHOT clean install -B -Dtest=ClusterSmokeTest#clusterKilTest | tee log or for issue with MDB not activated after kill/start: mvn -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1531909463-SNAPSHOT clean install -B -Dtest=Lodh2TestCase#testRemoteJcaInboundOnly | tee log

      There is regression in JGroups or Infinispan in one of our tests for fault tolerance of JMS bridges. However work on JMS bridge appears to be unrelated. Issue was hit in WF weekly run.

      Test Scenario:

      • There are two servers. InQueue is deployed on Node 1,
      • OutQueue is deployed on Node 2. Both servers are started.
      • Large byte messages are sent to InQueue deployed on Node 1. Bridge between servers/queues transfers messages from node 1 to node 2.
      • Node 1 is killed and started again.
      • All messages are received from OutQueue deployed on Node 2.

      Result:
      Node 1 does not start after kill and hangs. There is following exception logged in node 2:

      :26:17,894 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100000: Node node-1 joined the cluster
      09:26:18,520 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
      09:26:18,521 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node node-1 left the cluster
      09:26:18,521 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
      09:26:18,522 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node node-1 left the cluster
      09:26:18,522 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
      09:26:18,522 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node node-1 left the cluster
      09:26:18,522 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
      09:26:18,523 INFO  [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node node-1 left the cluster
      09:26:18,868 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t2) ISPN000310: Starting cluster-wide rebalance for cache default, topology CacheTopology{id=17, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[node-2: 122, node-1: 134]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (3)[node-2: 84, node-1: 90, node-1: 82]}, unionCH=null, actualMembers=[node-2, node-1, node-1], persistentUUIDs=[12443bfb-e88a-46f3-919e-9213bf38ce19, 2873237f-d881-463f-8a5a-940bf1d764e5, a05ea8af-a83b-42a9-b937-dc2da1cae6d1]}
      09:26:18,869 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t2) [Context=default][Scope=node-2]ISPN100002: Started rebalance with topology id 17
      09:26:18,870 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t5) [Context=default][Scope=node-2]ISPN100003: Node node-2 finished rebalance phase with topology id 17
      09:26:18,981 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t2) [Context=default][Scope=node-1]ISPN100003: Node node-1 finished rebalance phase with topology id 17
      09:27:18,530 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p15-t4) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [rt.jar:1.8.0_131]
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [rt.jar:1.8.0_131]
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
      		at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:71)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:540)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:523)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:334)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:85)
      		at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:745)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      		at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      		... 1 more
      	Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      		at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
      		at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
      		at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
      		at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      		... 1 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1]
      
      09:27:18,530 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p16-t4) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
      	Suppressed: java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      		at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [rt.jar:1.8.0_131]
      		at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [rt.jar:1.8.0_131]
      		at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
      		at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:71)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:540)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:523)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:334)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:85)
      		at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:745)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      		at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      		... 1 more
      	Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1
      		at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
      		at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
      		at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
      		at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
      		... 1 more
      	[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 4 from node-1]
      

      There is default JGroups udp stack configured which is used by Infinispan. Both of the servers (jgroups udp) are bound to 127.0.0.1. Node 2 has port offset 1000.

      Attaching thread dump from node 1 when it hangs during start.

              pferraro@redhat.com Paul Ferraro
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: