-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
False
-
False
-
Compatibility/Configuration
-
-
Undefined
-
When performing a HA failover from one NFS server to another (should be transparent to the application), both AMQ servers become unresponsive. The processes are still active, but consumers are not able to connect.
This is an intermittent problem, does not occur on every failover event.
Logs ========= The logs show that both brokers receive an error at 16:04:24: Master ----------- //NOTE: logs only go back to 16:00 $ egrep -r "Shutting|Starting|ERROR|WARN" amq01-logs | egrep -v "AMQ222061|AMQ224016|AMQ222107|AMQ212037" amq01-logs/artemis.log.2:2020-12-21 16:04:25,224 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=IO Error while calculating disk usage: java.nio.file.FileSystemException: /amqdata/amq7_broker_data/paging: Input/output error Slave ----------- $ egrep -r "Shutting|Starting|ERROR|WARN" amq02-logs | egrep -v "AMQ222061|AMQ224016|AMQ222107|AMQ212037" amq02-logs/artemis.log.5:2020-12-21 16:04:24,092 WARN [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Failure when accessing a lock file: java.io.IOException: Input/output error Thread Dumps =============== The thread dumps, captured about 15 mins later, indicate the following: Master ---------- "Thread-5 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46074492)" #34 prio=5 os_prio=0 cpu=265.58ms elapsed=774.83s tid=0x00007f476d086800 nid=0x17cb waiting on condition [0x00007f470fbf9000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native Method) - parking to wait for <0x000000009dece8c8> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.7/LockSupport.java:234) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.7/AbstractQueuedSynchronizer.java:1079) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.7/AbstractQueuedSynchronizer.java:1369) at java.util.concurrent.CountDownLatch.await(java.base@11.0.7/CountDownLatch.java:278) at org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:61) at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendCommitRecord(JournalBase.java:63) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendCommitRecord(JournalImpl.java:91) at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.commitBindings(AbstractJournalStorageManager.java:658) ... "Thread-10" #106 prio=5 os_prio=0 cpu=355.87ms elapsed=547.38s tid=0x00007f4734002000 nid=0x1bc0 waiting for monitor entry [0x00007f47033e3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.stop(PostOfficeImpl.java:198) - waiting to lock <0x000000008069d340> (a org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1356) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1170) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1051) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5.run(ActiveMQServerImpl.java:857) Locked ownable synchronizers: - None Slave ---------- "AMQ229000: Activation for server ActiveMQServerImpl::serverUUID=ee4ebeb8-2391-11eb-9967-0021f64befb7" #15 prio=5 os_prio=0 cpu=317.50ms elapsed=685.73s tid=0x00007f39c8f48000 nid=0x4deb waiting on condition [0x00007f399cae5000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(java.base@11.0.6/Native Method) at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:183) at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3730) Locked ownable synchronizers: - None
- clones
-
ENTMQBR-4412 master/slave brokers both become unresponsive with nfs-side HA failover
- Closed