Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-2056

Unclean broker shutdown causes message loss on CIFS kahadb storage

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • JBoss A-MQ 6.3.x
    • JBoss A-MQ 6.3, JBoss A-MQ 6.3.x
    • kahadb
    • None
    • Hide
      1. deploy master-slave broker with shared CIFS kahadb storage
      2. send messages to queue
      3. deploy camel route which transfer messages from one queue to another
      4. kill broker repeatedly while messages are being transfered
        • repeat steps until broker refuses to start due to missing KahaDB journal
      Show
      deploy master-slave broker with shared CIFS kahadb storage send messages to queue deploy camel route which transfer messages from one queue to another kill broker repeatedly while messages are being transfered repeat steps until broker refuses to start due to missing KahaDB journal

    Description

      I deployed two brokers on windows machines on openstack in master slave setup. KahaDB is located on shared CIFS storage. Then I sent messages into master broker and start transacted camel route which transfered messages from one queue to another one. During message transfer I repeatedly kill master container. At some point broker refused to start because of missing KahaDB journal.

      09:50:55,169 | INFO  | {AMQ-1-thread-1} [io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$1] (ActiveMQServiceFactory.java:502) | 231 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-630187 | Broker amq failed to start.  Will try again in 10 seconds
      09:50:55,169 | ERROR | {AMQ-1-thread-1} [io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$1] (ActiveMQServiceFactory.java:503) | 231 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-630187 | Exception on start: java.io.IOException: Detected missing journal files. [6]
      java.io.IOException: Detected missing journal files. [6]
      	at org.apache.activemq.store.kahadb.MessageDatabase.recoverIndex(MessageDatabase.java:935)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.MessageDatabase$5.execute(MessageDatabase.java:676)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:673)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:429)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:447)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:283)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:205)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:223)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.broker.BrokerService.doStartPersistenceAdapter(BrokerService.java:658)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.broker.BrokerService.startPersistenceAdapter(BrokerService.java:642)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at org.apache.activemq.broker.BrokerService.start(BrokerService.java:607)[219:org.apache.activemq.activemq-osgi:5.11.0.redhat-630187]
      	at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.doStart(ActiveMQServiceFactory.java:549)[231:io.fabric8.mq.mq-fabric:1.2.0.redhat-630187]
      	at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.access$400(ActiveMQServiceFactory.java:359)[231:io.fabric8.mq.mq-fabric:1.2.0.redhat-630187]
      	at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$1.run(ActiveMQServiceFactory.java:490)[231:io.fabric8.mq.mq-fabric:1.2.0.redhat-630187]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_111]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_111]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_111]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_111]
      	at java.lang.Thread.run(Thread.java:745)[:1.8.0_111]
      

      I have also tried to ignore missing journal by using ignoreMissingJournalFile=true KahaDB option. Although this workaround enables broker to start it helps only in case there is small number of messages in the system (up to 500 000 messages). However, when I apply the workaround when there is 1 000 000 or more messages, then workaround enables broker to start, but a large number of messages is lost.

      Attachments

        1. 10.8.181.87-client.log
          564 kB
        2. 10.8.181.88-broker1.log
          3.11 MB
        3. 10.8.181.90-broker2.log
          3.08 MB
        4. kahadb.tar.gz.01
          15.00 MB
        5. kahadb.tar.gz.02
          15.00 MB
        6. kahadb.tar.gz.03
          14.68 MB

        Issue Links

          Activity

            People

              gtully@redhat.com Gary Tully
              knetl.j@gmail.com Jakub Knetl (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: