Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-9868

Unresponsive server after connected server with MDB is killed and restarted during XA transaction

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • AMQ 7.13.0.GA
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      git clone git@gitlab.cee.redhat.com:jbossqe-eap/messaging-testsuite.git messaging-testsuite
      cd messaging-testsuite/scripts/
      
      
      groovy -DEAP_ZIP_URL=<path_to_server_zip_file> PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      mvn --batch-mode clean test -Dtest=BytemanLodh2TestCase#testLodh2KillWithTempTopicOnTransactionCommit -Dsurefire.failIfNoSpecifiedTests=false -Deap7.clients.version=8.1749848644-SNAPSHOT -Deap7.org.jboss.qa.hornetq.apps.clients.version=8.1749848644-SNAPSHOT | tee log
      

      Note: Issue is intermittent and might depend on performance of hardware to reproduce.

      Show
      git clone git@gitlab.cee.redhat.com:jbossqe-eap/messaging-testsuite.git messaging-testsuite cd messaging-testsuite/scripts/ groovy -DEAP_ZIP_URL=<path_to_server_zip_file> PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn --batch-mode clean test -Dtest=BytemanLodh2TestCase#testLodh2KillWithTempTopicOnTransactionCommit -Dsurefire.failIfNoSpecifiedTests= false -Deap7.clients.version=8.1749848644-SNAPSHOT -Deap7.org.jboss.qa.hornetq.apps.clients.version=8.1749848644-SNAPSHOT | tee log Note: Issue is intermittent and might depend on performance of hardware to reproduce.

      EAP server which is in Artemis cluster with other server seems to be stuck/slow after connected server over remote JCA is killed and restarted. Issue occurs intermittently.

      Customer Impact: Server may become permanently unresponsive following the restart of Node 2. Potential workaround to restore service on Node 1 is to manually restart Node 1. This may cause service downtime and disrupted message delivery.

      Test Scenario (4-node setup):

      • Cluster A: node 1 and node 3 (started initially) and form cluster.
      • Nondurable topic InTopic and queue OutQueue deployed on nodes 1 and 3.
      • A publisher sends 2000 mixed (small + large) messages to InTopic to node 1.
      • MDBs are deployed on nodes 2 and 4 to consume from InTopic and forward to OutQueue in an XA transaction. MDB creates NON-durable subscription on InTopic.
      • Node 2 (with active MDB) is killed during transaction commit.
      • Node 2 is then restarted.
      • Messages are read from OutQueue from node 1.

      Expected Result:
      Node 2 with MDB is able to connect to node 1 again and creates new non-durable subscriptions.

      Actual Result:
      After Node 2 restarts, it attempts to reconnect to Node 1. Node 1 becomes unresponsive, appearing stuck while waiting on journal operations. This may indicate a deadlock or long-running operation holding journal locks. The receiver attempting to connect to Node 1 times out during connection creation.

      Interesting thing is that Critical Analyzer in Artemis is triggered and logs thread dump. Attaching for investigation.

              Unassigned Unassigned
              ehugonne1@redhat.com Emmanuel Hugonnet
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: