Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2147

(7.2.z) Backup doesn't activate after shared store is reconnected

    XMLWordPrintable

Details

    • Release Notes
    • Hide
      Previously, if you had a live-backup broker pair configured for high availability using shared store, activation of the backup broker upon shutdown of the live broker could fail. Specifically, this situation occurred if the shared store had previously been disconnected and reconnected, before shutdown of the live broker. This issue is now resolved.
      Show
      Previously, if you had a live-backup broker pair configured for high availability using shared store, activation of the backup broker upon shutdown of the live broker could fail. Specifically, this situation occurred if the shared store had previously been disconnected and reconnected, before shutdown of the live broker. This issue is now resolved.
    • Documented as Resolved Issue
    • Hide
      1. Start live server on node1
        $ sh standalone.sh -c standalone-full-ha-live.xml -DsharedDirectory=$NFS_DIRECTORY -b $IP_1
      2. Start backup on node2
        $ sh standalone.sh -c standalone-full-ha-backup.xml -DsharedDirectory=$NFS_DIRECTORY -b $IP_2
      3. Unmount NFS on node2
      4. Reconnect NFS on node2
      5. Press Ctrl + C to shutdown EAP on node1
        Backup doesn't activate
      Show
      Start live server on node1 $ sh standalone.sh -c standalone-full-ha-live.xml -DsharedDirectory=$NFS_DIRECTORY -b $IP_1 Start backup on node2 $ sh standalone.sh -c standalone-full-ha-backup.xml -DsharedDirectory=$NFS_DIRECTORY -b $IP_2 Unmount NFS on node2 Reconnect NFS on node2 Press Ctrl + C to shutdown EAP on node1 Backup doesn't activate
    • AMQ Sprint 3219, AMQ Broker 1119

    Description

      Scenario

      1. Start live backup server pair in dedicated topology with shared store HA, with journal located on NFS
      2. NFS mounted on backup server fails
      3. Reconnect NFS on backup server
      4. Try to shut down live EAP server
      5. Backup doesn't activate

      What happens
      Backup is waiting for live to fail by checking its file lock. In case the connection to shared storage fails, backup logs following error.

      05:50:57,896 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475) AMQ224000: Failure in initialisation: java.io.IOException: Input/output error
      	at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) [rt.jar:1.8.0_151]
      	at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90) [rt.jar:1.8.0_151]
      	at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115) [rt.jar:1.8.0_151]
      	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      
      

      Exception is caught in SharedStoreBackupActivation.run, and causes termination of backup activation process.

      In case the NFS is reconnected later, backup server doesn't continue in activation process and it doesn't wait for live to fail. In case the live fails, backup doesn't activate, even though it has a connection to shared storage.

      Backup should retry checking live lock even in case the storage is unavailable. It should log warning/error messages that storage is unavailable, but it should not terminate the activation process. This would allow backup to continue its duties when the storage is reconnected.

      Attachments

        Issue Links

          Activity

            People

              thofman Tomas Hofman
              mnovak1@redhat.com Miroslav Novak
              Roman Vais Roman Vais
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: