Scenario:
- We have two servers Live and Backup configured in replicated topology with http connectors
- Shutdown/Kill Live server
- Start Live server
Sometimes happens that Live does not become active after the failback. In the log of Live [1] I can see that server was synchronized with Backup and it announced that it is (temporary) backup. However the Backup did not receive response on SynchronizationDone packet and it did not restart, see [2]. In the trace logs I see that Live sent the response but the Backup did not receive it.
Maybe the issue was already hit in JBEAP-3998, see comment
[1]
Live
16:04:17,019 INFO [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) AMQ221024: Backup server ActiveMQServerImpl::serverUUID=7289474b-234a-11e6-916a-177a69616978 is synchronized with live-server. 16:04:45,925 INFO [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@167daaf-1147487463)) AMQ221031: backup announced
Live trace log
16:04:17,019 INFO [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) AMQ221024: Backup server ActiveMQServerImpl::serverUUID=7289474b-234a-11e6-916a-177a69616978 is synchronized with live-server. 16:04:17,019 TRACE [org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@167daaf-1147487463)) org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper@4a2d3ef6{refCount=3, channel=org.jgroups.fork.ForkChannel@3402946b, channelName='activemq-cluster', connected=true}::RefCount++ = 3 on channel activemq-cluster 16:04:17,019 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) Sending packet nonblocking PACKET(ReplicationResponseMessageV2)[type=-9, channelID=2, packetObject=ReplicationResponseMessageV2, synchronizationIsFinishedAcknowledgement=true] on channeID=2
[2]
Backup
16:04:45,914 WARN [org.apache.activemq.artemis.core.server] (Thread-127) AMQ222013: Error when trying to start replication: java.lang.IllegalStateException: AMQ119114: Replication synchronization process timed out after waiting 30 000 milliseconds at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:596) at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:392) at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:163) at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91]
- is related to
-
JBEAP-7968 (7.1.0) The backup server is not responding promptly introducing latency beyond the limit.
- Closed