Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-1709

Missing kahaDB journal after broker failover in fabric8

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: JBoss A-MQ 6.2.1, JBoss A-MQ 6.3
    • Fix Version/s: JBoss A-MQ 6.3.x
    • Component/s: None
    • Labels:
      None
    • Steps to Reproduce:
      Hide
      1. create fabric
      2. create two ssh containers for master slave:
        fabric:mq-create --no-ssl --parent-profile mq-base --group fabric-group --kind MasterSlave --data /mnt/nfs/fuse-shared/fabricFaframTest fabric-broker
        container-create-ssh --jvm-opts "-Xms1024M -Xmx3048M -XX:PermSize=128M -XX:MaxPermSize=512M "   --host 10.8.50.47  --user ***** --password ***** container1
        mq-create --no-ssl --assign-container container1  --parent-profile mq-base --group fabric-group --data /mnt/nfs/fuse-shared/fabricFaframTest --kind MasterSlave fabric-broker
        container-create-ssh --jvm-opts "-Xms1024M -Xmx3048M -XX:PermSize=128M -XX:MaxPermSize=512M "   --host 10.8.53.94  --user ***** --password ***** container2
        mq-create --no-ssl --assign-container container2  --parent-profile mq-base --group fabric-group --data /mnt/nfs/fuse-shared/fabricFaframTest --kind MasterSlave fabric-broker
        
      3. send messages to arbitrary queue
      4. deploy transacted camel route which will transfer the messages from the queue to another queue
        • the queue should be deployed on separate container (so it is not killed with broker container)
        • see attachment for more details
      5. kill master container few times untill broker refuses to start due to missing journal
      Show
      create fabric create two ssh containers for master slave: fabric:mq-create --no-ssl --parent-profile mq-base --group fabric-group --kind MasterSlave --data /mnt/nfs/fuse-shared/fabricFaframTest fabric-broker container-create-ssh --jvm-opts "-Xms1024M -Xmx3048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host 10.8.50.47 --user ***** --password ***** container1 mq-create --no-ssl --assign-container container1 --parent-profile mq-base --group fabric-group --data /mnt/nfs/fuse-shared/fabricFaframTest --kind MasterSlave fabric-broker container-create-ssh --jvm-opts "-Xms1024M -Xmx3048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host 10.8.53.94 --user ***** --password ***** container2 mq-create --no-ssl --assign-container container2 --parent-profile mq-base --group fabric-group --data /mnt/nfs/fuse-shared/fabricFaframTest --kind MasterSlave fabric-broker send messages to arbitrary queue deploy transacted camel route which will transfer the messages from the queue to another queue the queue should be deployed on separate container (so it is not killed with broker container) see attachment for more details kill master container few times untill broker refuses to start due to missing journal

      Description

      I have master/slave broker in fabric. The broker data are stored on shared FS (NFSv4). The camel route is running on different container and transfer's messages from one queue to another. Sometimes the broker is unable to start after failover (the failover is caused by killing karaf container with master node). The zookeeper correctly chooses another node as a master and broker tries to start, but start fails because there is missing journal:

      exception on start
      2016-05-23 06:07:38,855 | ERROR | AMQ-1-thread-1   | ActiveMQServiceFactory           | 157 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-630069 | Exception on start: java.io.IOException: Detected missing journal files. [2]
      java.io.IOException: Detected missing journal files. [2]
              at org.apache.activemq.store.kahadb.MessageDatabase.recoverIndex(MessageDatabase.java:916)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.MessageDatabase$5.execute(MessageDatabase.java:660)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:657)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:426)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:444)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:280)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:205)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:223)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.broker.BrokerService.doStartPersistenceAdapter(BrokerService.java:657)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.broker.BrokerService.startPersistenceAdapter(BrokerService.java:641)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at org.apache.activemq.broker.BrokerService.start(BrokerService.java:606)[163:org.apache.activemq.activemq-osgi:5.11.0.redhat-630069]
              at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.doStart(ActiveMQServiceFactory.java:549)[157:io.fabric8.mq.mq-fabric:1.2.0.redhat-630069]
              at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.access$400(ActiveMQServiceFactory.java:359)[157:io.fabric8.mq.mq-fabric:1.2.0.redhat-630069]
              at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$1.run(ActiveMQServiceFactory.java:490)[157:io.fabric8.mq.mq-fabric:1.2.0.redhat-630069]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_101]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_101]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_101]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_101]
              at java.lang.Thread.run(Thread.java:745)[:1.7.0_101]
      

      I tried to alter activemq configuration by adding ignoreMissingJournalfiles="true" and broker started afterwards. All messages was present in the store so there was no message lost but I am not sure if it always will be the case.

      I tried to execute the test outside fabric8 environment and I am unable to reproduce the issue. The only difference in non-fabric environment is that I don't use discovery:(fabric:group)) for camel route but I use failover:(node1,node2).

        Gliffy Diagrams

          Attachments

          1. camel-route.tar.gz
            2 kB
          2. container2.log
            398 kB
          3. kahadb-missing-journal.zip
            15.81 MB

            Issue Links

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  jknetl Jakub Knetl
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: