Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-970

In shared-nothing set-up, HA lost because master cannot re-sync from running slave

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None

      The system is running in a degraded state, with the designated slave serving requests. The designated master will not restart, because the process of sending files from the designated slave times out with an error message:

      2018-01-08 14:23:23,324 INFO  [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/projects/jboss_amq_instances/XXX/./data/bindings/activemq-bindings-744.bindings (size=1,048,576) to replica.
      2018-01-08 14:23:53,326 WARN  [org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager] AMQ119114: Replication synchronization process timed out after waiting 30,000 milliseconds: java.lang.IllegalStateException: AMQ119114: Replication synchronization process timed out after waiting 30,000 milliseconds
              at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:608) [artemis-server-2.0.0.amq-700013-redhat-1.jar:2.0.0.amq-700013-redhat-1]
      

      Repeated attempts to resychronize all fail, but it seems the fail in a different place each time. A large number of files are successfully transferred in a short time (a second or two), and then one particular file doesn't get transferring. It does not appear to be the case that the cumulative time to transfer the files is exceeding the timeout, but rather that one single file (and a different one each time) is causing the problem.

        1. artemis_stuff.tar
          25 kB
        2. artemisslave.log
          165 kB
        3. dataprint.txt.zip
          1.27 MB
        4. lslrout.txt
          31 kB

            csuconic@redhat.com Clebert Suconic
            rhn-support-kboone Kevin Boone
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: