Uploaded image for project: 'WildFly WIP'
  1. WildFly WIP
  2. WFWIP-72

Critical IO Error ... when starting Artemis with HA JDBC store

XMLWordPrintable

    • Hide

      Reproducer - issue is intermittent:

      git clone https://gitlab.mw.lab.eng.bos.redhat.com/eduda/messaging-testsuite.git
      cd eap-tests-hornetq/scripts/
      git checkout jdbc-ha
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/eap-7.x-messaging-testing-prepare/122/artifact/jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ColocatedClusterFailoverTestCase#testFailbackTransAckQueue -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1532075008-SNAPSHOT -DfailIfNoTests=false -Deap=7x -Dprepare.param.DATABASE=oracle12cR2 -Dprepare.param.JDBC_STORE=true | tee log
      
      Show
      Reproducer - issue is intermittent : git clone https: //gitlab.mw.lab.eng.bos.redhat.com/eduda/messaging-testsuite.git cd eap-tests-hornetq/scripts/ git checkout jdbc-ha groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/eap-7.x-messaging-testing-prepare/122/artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ColocatedClusterFailoverTestCase#testFailbackTransAckQueue -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1532075008-SNAPSHOT -DfailIfNoTests= false -Deap=7x -Dprepare.param.DATABASE=oracle12cR2 -Dprepare.param.JDBC_STORE= true | tee log

      One of the servers in collocated HA topology with JDBC store can fail on critical IO exception and stop itself.

      This was hit on Artemis 1.5.5.012 and WF: https://github.com/jmesnil/wildfly - WFLY-9513_messaging_jdbc_HA_shared-store branch.

      Test scenario:

      • Start 2 WF/EAP servers in collocated topology with Artemis HA JDBC store
      • Start client which are sending and consuming messages to/from queue from 1st server

      Result:
      There is intermittent failure when client start to send/receive messages on 1st server. 1st fails on Critical IO Error with exception:

      10:19:27,965 WARN  [org.apache.activemq.artemis.journal] (Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@7de7cbe3)) AMQ142021: Error on IO callback, null
      10:19:27,965 WARN  [org.apache.activemq.artemis.core.server] (Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@7de7cbe3)) AMQ222010: Critical IO Error, shutting down the server. file=org.apache.activemq.artemis.jdbc.store.file.JDBCSequentialFile@407c5d8f, message=Error writing to JDBC file.: java.lang.NullPointerException
              at org.apache.activemq.artemis.jdbc.store.file.JDBCSequentialFile.internalWrite(JDBCSequentialFile.java:161) [artemis-jdbc-store-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
              at org.apache.activemq.artemis.jdbc.store.file.JDBCSequentialFile.internalWrite(JDBCSequentialFile.java:186) [artemis-jdbc-store-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
              at org.apache.activemq.artemis.jdbc.store.file.JDBCSequentialFile.lambda$scheduleWrite$1(JDBCSequentialFile.java:197) [artemis-jdbc-store-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
              at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_171]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_171]
              at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171]
      

      NPE happens at JDBCSequentialFile.internalWrite line 161:

      private synchronized int internalWrite(byte[] data, IOCallback callback) {
            try {
               open();
      161         synchronized (writeLock) { <-- NPE is thrown here
      ...
      

      .

      Attaching logs from the test.

              mtaylor1@redhat.com Martyn Taylor (Inactive)
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: