-
Bug
-
Resolution: Done
-
Major
-
AMQ 7.5.0.GA
-
None
For the following configuration:
Master:
<ha-policy> <replication> <master> <check-for-live-server>true</check-for-live-server> <group-name>prod-pair-1</group-name> <vote-on-replication-failure>true</vote-on-replication-failure> </master> </replication> </ha-policy>
Slave:
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<group-name>prod-pair-1</group-name>
</slave>
</replication>
</ha-policy>
Failback to master intermittently fails due to unknown conditions - failure rate is approximately .5% to 33%. The master appears to start without checking for the presence of a live server, and the end result is to live servers. The following log sequence is observed*:
Master:
2019-09-24 22:28:12,205 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/largemessages,pagingDirectory=./data/paging)
...
2019-09-24 22:28:15,769 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
All that is logged on the slave is:
2019-09-24 22:28:15,756 WARN [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=50bb72f2-de2c-11e9-94c6-005056be7594
A normal/success sequence for this master broker is:
2019-09-24 22:25:41,756 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/largemessages,pagingDirectory=./data/paging) 2019-09-24 22:25:45,641 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.7.0-PATCH-2930 [null] started, waiting live to fail before it gets active 2019-09-24 22:25:47,134 INFO [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::serverUUID=50bb72f2-de2c-11e9-94c6-005056be7594 is synchronized with live-server. 2019-09-24 22:25:47,818 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
And the slave normal/success logs are:
2019-09-24 22:25:45,834 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile:/mq/amq/./data/journal/activemq-data-175801.amq (size=10,485,760) to replica. 2019-09-24 22:25:46,328 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile:/mq/amq/./data/journal/activemq-data-175800.amq (size=10,485,760) to replica. 2019-09-24 22:25:46,574 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /mq/amq/./data/bindings/activemq-bindings-4030.bindings (size=1,048,576) to replica. 2019-09-24 22:25:46,592 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /mq/amq/./data/bindings/activemq-bindings-4031.bindings (size=1,048,576) to replica. 2019-09-24 22:25:46,610 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /mq/amq/./data/bindings/activemq-bindings-4029.bindings (size=1,048,576) to replica. 2019-09-24 22:25:47,162 INFO [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.prod1-cluster.7cb5b06b-de2c-11e9-836f-005056be48f5 2019-09-24 22:25:47,177 INFO [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.prod1-cluster.a8997fb1-de2c-11e9-bd93-0050569a3208 2019-09-24 22:25:47,178 INFO [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.prod1-cluster.d4ac9c32-de2c-11e9-9463-0050569ae521 2019-09-24 22:25:47,178 INFO [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.prod1-cluster.00bd6ef2-de2d-11e9-80cc-005056be2edc 2019-09-24 22:25:47,213 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.7.0-PATCH-2930 [50bb72f2-de2c-11e9-94c6-005056be7594] stopped, uptime 2 minutes 2019-09-24 22:25:47,213 INFO [org.apache.activemq.artemis.core.server] AMQ221039: Restarting as Replicating backup server after live restart 2019-09-24 22:25:47,213 INFO [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/largemessages,pagingDirectory=./data/paging) 2019-09-24 22:25:47,214 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /mq/amq/./data/bindings/oldreplica.337 2019-09-24 22:25:47,217 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /mq/amq/./data/bindings to /mq/amq/./data/bindings/oldreplica.339 2019-09-24 22:25:47,217 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /mq/amq/./data/journal/oldreplica.338 2019-09-24 22:25:47,223 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /mq/amq/./data/journal to /mq/amq/./data/journal/oldreplica.340 2019-09-24 22:25:47,223 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /mq/amq/./data/paging/oldreplica.337 2019-09-24 22:25:47,224 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /mq/amq/./data/paging to /mq/amq/./data/paging/oldreplica.339 2019-09-24 22:25:47,225 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal 2019-09-24 22:25:48,360 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.7.0-PATCH-2930 [null] started, waiting live to fail before it gets active 2019-09-24 22:25:49,912 INFO [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::serverUUID=50bb72f2-de2c-11e9-94c6-005056be7594 is synchronized with live-server. 2019-09-24 22:25:51,819 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced
- Note logs lines were cherry picked from the full logs, but the full logs are available.
- blocks
-
WFLY-14071 Reenable ReplicatedFailoverTestCase on GitHub Actions
- Resolved
- is blocked by
-
ENTMQBR-2932 Remote JMX server on slave shuts down during failback
- Closed
-
ENTMQBR-3113 Remote JMX server on slave shuts down during failback
- Closed
- is related to
-
ENTMQBR-3131 Topology Fails to Update correctly for Backup Brokers when Master is Killed
- Closed