Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-2393

Index Corruption Leading to Duplicate Messages in Network of Brokers

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • JBoss A-MQ 6.3
    • broker, kahadb
    • None
    • Hide

      Reproducer in work.  The essentials are:

      1. Create a 3-node broker cluster, with each broker configured to host kahadb in a separate nfs directory. Use the recommended nfs options for mounting
      2. Produce a few thousand (I used 9000 10kb) messages to the cluster
      3. Start some consumers (I started 30 threads) that don't ack messages and just call session.recover()
      4. Let consumers run until all messages are DLQed
      5. Move messages back to original queue
      6. Repeat until problem reproduces and broker fails to restart (with restartAllowed=true)
      7. Stop brokers, clean up index files and restart, observe message counts (easier if consumers are stopped)
      Show
      Reproducer in work.  The essentials are: Create a 3-node broker cluster, with each broker configured to host kahadb in a separate nfs directory. Use the recommended nfs options for mounting Produce a few thousand (I used 9000 10kb) messages to the cluster Start some consumers (I started 30 threads) that don't ack messages and just call session.recover() Let consumers run until all messages are DLQed Move messages back to original queue Repeat until problem reproduces and broker fails to restart (with restartAllowed=true) Stop brokers, clean up index files and restart, observe message counts (easier if consumers are stopped)

    Description

      In a 3-node network of brokers with persistence on network storage, there is a possibility of index corruption on failed store operations.  With NFS, for example, setting the timeo value to the recommended value of 20 ms can result in Input/Output errors under load, causing a broker restart.  Occasionally, this index corruption can result in failure of the broker to pass journal checks and start, necessitating removal of the index files to recover.

      Sometimes this seems to result in erroneous message counts and duplicate messages within the NOB.  For example, the broker displays a lower number of messages in the queue counts than actually exist in the journals.  When the index is removed and rebuilt, counts go up and sometimes extra / duplicate messages are observed.

      Attachments

        1. amq.log.gz
          541 kB
        2. amq.log.tar.gz
          1.52 MB
        3. amq.log.tar.gz
          407 kB
        4. logs.630446.tar.gz
          4.16 MB
        5. logs.journal.trace.tar.gz
          1.00 MB
        6. logs.tar.gz
          10.72 MB

        Activity

          People

            gtully@redhat.com Gary Tully
            rhn-support-dhawkins Duane Hawkins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: