-
Bug
-
Resolution: Done
-
Major
-
JBoss A-MQ 6.1
In a master-slave A-MQ set-up, with a shared filesystem, a split-brain (dual master) situation is observed after a network-level failure affecting the connection between the broker host and the storage.Looking at the attached master.log and slave.log, we can see that the master receives a SyncFailedException when trying to flush the KahaDB file. The master then tries to shut down, but seems to be unable to, owing to a bunch of I/O-related problems. However, it appears to have dropped its filesystem lock, because the slave reports that it has locked the file and is coming up. You can see from the master and slave logs that both A-MQ instances are processing the same corrupt KahaDB file. Both are now "masters" in some sense.I surmise that the SyncFailedException is not handled properly here – because the filesystem connection is defective in some way at this point, exceptions are thrown whilst trying to close down, and the master remains a master, even though the slave has taken over the master role.
- is related to
-
ENTMQ-977 KeepAlive timer in shared file lock doesn't detect lock deletion in time
- Closed