Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-10414

Deadlock in cluster test case with network failures

XMLWordPrintable

    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout f1860bef3e523ad4ef25499554cfb7508f464c53
      groovy -DEAP_VERSION=7.1.0.DR17 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ClusterTestCase#clusterTestWithNetworkFailures -DfailIfNoTests=false -Deap=7x | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout f1860bef3e523ad4ef25499554cfb7508f464c53 groovy -DEAP_VERSION=7.1.0.DR17 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ClusterTestCase#clusterTestWithNetworkFailures -DfailIfNoTests= false -Deap=7x | tee log
    • AMQ Sprint 1

      Scenario:

      • There are two servers in cluster.
      • Producer sends 10 000 messages to server 1.
      • Receiver receives messages from server 2.
      • During receiving of messages server 1 is several times suspended to cause expiration of timeouts.

      Expectation: Receiver receives all messages sent by producer.

      Reality: Receiver doesn't receive all messages. Cluster bridge is not reconnected because of deadlock.

      Customer impact: There is a risk of rising deadlock in cluster scenario. The test reveals that the deadlock blocks reconnection of cluster bridge. This issue may have serious impact on entire messaging service.

      Determination if it is regression is very difficult. The deadlock was hit in scenario with JDBC persistence store. However stack traces of threads causing the dead lock don't refer to JDBC. I think it is just easier to hit it with JDBC store because it is much more slower than file-based store.

      In [1] you can see stack traces of threads causing the deadlock. For entire thread dump see attachment.

      [1]

      "Thread-21 (ActiveMQ-client-global-threads-1513770209)":
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.getConsumerCount(QueueImpl.java:834)
      	- waiting to lock <0x00000000d494d950> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.doConsumerCreated(ClusterConnectionImpl.java:1314)
      	- locked <0x00000000d4a071e0> (a org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl)
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.handleNotificationMessage(ClusterConnectionImpl.java:1029)
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.onMessage(ClusterConnectionImpl.java:1004)
      	- locked <0x00000000d4a071e0> (a org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1001)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:49)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1124)
      	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      "Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@6d0df3ab-2070751283)":
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.disconnectBindings(ClusterConnectionImpl.java:1152)
      	- waiting to lock <0x00000000d4a071e0> (a org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl)
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl.disconnectRecord(ClusterConnectionImpl.java:1485)
      	at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge.fail(ClusterConnectionBridge.java:357)
      	at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:645)
      	at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:601)
      	at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.deliverStandardMessage(BridgeImpl.java:717)
      	at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:582)
      	- locked <0x00000000d494e920> (a org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge)
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2598)
      	- eliminated <0x00000000d494d950> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:1995)
      	- locked <0x00000000d494d950> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$1700(QueueImpl.java:101)
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2784)
      	- locked <0x00000000d4a07790> (a org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner)
      	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

              rh-ee-ataylor Andy Taylor
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: