Affects Version/s: None
Fix Version/s: AMQ 7.8.0.CR2
We are looking for options to support transparent failover / recovery in the following scenario:
We have multiple datacenters (let's use EAST and WEST as an example)
Messaging clients have an affinity for the "closest" datacenter
Within a single datancenter we have master/slave pairs with synchronous replication (or shared-file)
We would like to stage a second slave in the alternate datacenter, perhaps with asynchronous replication. The alternate would be part of the discovery group (likely jgroups / unicast, going across subnets). During normal operations, replication would occur synchronously to the local slave, but asynchronously to this second slave in the remote datacenter. In the event of a complete datacenter outage, the second slave in the remote DC would become active as a master and clients could transparently fail over to that slave.
With async replication, there would be a possibility that the second slave would be "behind" the locally replicated pair, so some possibility of duplicate / missing messages would occur.
Recovery from the failure: The remote broker would eventually be "ahead" of the primary master/slave pair and would need to be resynced - possibly introducing a pause in service upon recovery.
"Local" network outage: If connection fails only between the two datacenters, it would be possible for the remote slave to be active while the local pair is still servicing requests. This could be addressed by some sort of connection prioritization to prevent clients from attaching to the remote broker. Upon recovery, some means of determining which direction to sync state would be needed (e.g. detect that remote slave is not "ahead" of local pair and resume replication in the "normal" direction from the master to the remote slave).