-
Bug
-
Resolution: Done
-
Critical
-
AMQ 7.2.1.GA
-
Release Notes
-
-
-
Documented as Resolved Issue
-
-
AMQ Sprint 3219, AMQ Broker 1119
Scenario
- Start live backup server pair in dedicated topology with shared store HA, with journal located on NFS
- NFS mounted on backup server fails
- Reconnect NFS on backup server
- Try to shut down live EAP server
- Backup doesn't activate
What happens
Backup is waiting for live to fail by checking its file lock. In case the connection to shared storage fails, backup logs following error.
05:50:57,896 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475) AMQ224000: Failure in initialisation: java.io.IOException: Input/output error at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) [rt.jar:1.8.0_151] at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90) [rt.jar:1.8.0_151] at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115) [rt.jar:1.8.0_151] at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
Exception is caught in SharedStoreBackupActivation.run, and causes termination of backup activation process.
In case the NFS is reconnected later, backup server doesn't continue in activation process and it doesn't wait for live to fail. In case the live fails, backup doesn't activate, even though it has a connection to shared storage.
Backup should retry checking live lock even in case the storage is unavailable. It should log warning/error messages that storage is unavailable, but it should not terminate the activation process. This would allow backup to continue its duties when the storage is reconnected.
- clones
-
JBEAP-14032 [GSS](7.2.z) ARTEMIS-2069 - Backup doesn't activate after shared store is reconnected
- Closed
-
WFLY-10968 Backup doesn't activate after shared store is reconnected
- Closed
- relates to
-
ENTMQBR-3275 Regression: Backup doesn't activate after shared store is reconnected
- Closed
- links to