Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-886

[ENH] Async Replication for Disaster Mitigation / Recovery

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Minor Minor
    • AMQ 7.8.0.CR2
    • None
    • None

      We are looking for options to support transparent failover / recovery in the following scenario:

      We have multiple datacenters (let's use EAST and WEST as an example)

      Messaging clients have an affinity for the "closest" datacenter

      Within a single datancenter we have master/slave pairs with synchronous replication (or shared-file)

      We would like to stage a second slave in the alternate datacenter, perhaps with asynchronous replication. The alternate would be part of the discovery group (likely jgroups / unicast, going across subnets). During normal operations, replication would occur synchronously to the local slave, but asynchronously to this second slave in the remote datacenter. In the event of a complete datacenter outage, the second slave in the remote DC would become active as a master and clients could transparently fail over to that slave.

      Complications:

      With async replication, there would be a possibility that the second slave would be "behind" the locally replicated pair, so some possibility of duplicate / missing messages would occur.

      Recovery from the failure: The remote broker would eventually be "ahead" of the primary master/slave pair and would need to be resynced - possibly introducing a pause in service upon recovery.

      "Local" network outage: If connection fails only between the two datacenters, it would be possible for the remote slave to be active while the local pair is still servicing requests. This could be addressed by some sort of connection prioritization to prevent clients from attaching to the remote broker. Upon recovery, some means of determining which direction to sync state would be needed (e.g. detect that remote slave is not "ahead" of local pair and resume replication in the "normal" direction from the master to the remote slave).

        1. Test broker connection auto-start.txt
          4 kB
        2. Test broker connection invalid user and password options.txt
          1 kB
        3. Test broker connection retries and retry-interval and reconnect-attempts.txt
          4 kB
        4. Test broker connection with valid user and password options.txt
          1 kB
        5. Test disable broker connection.txt
          7 kB
        6. Test mirror between a main and replica brokers with default options.txt
          3 kB
        7. Test mirror between a main and replica brokers with default options but message-acknowledgements=false.txt
          8 kB
        8. Test mirror between a main and replica brokers with default options but queue-creation=false.txt
          5 kB
        9. Test mirror between a main and replica brokers with default options but queue-creation=false and queue already exists on replica.txt
          6 kB
        10. Test mirror between a main and replica brokers with default options but queue-removal=false.txt - FAILED
          9 kB
        11. Test mirror between a main and replica brokers with default options but source-mirror-address defined and check if messages are not replicated after a broker restart.txt
          9 kB
        12. Test mirror between a main and replica brokers with default options but source-mirror-address defined and queues creation and deletion works on broker restart.txt
          11 kB
        13. Test sender and receiver with address-match.txt
          6 kB
        14. Test sender and receiver with address-match without wildcard.txt.txt
          5 kB
        15. Test sender and receiver with queue-name.txt
          4 kB

              csuconic@redhat.com Clebert Suconic
              rhn-support-dhawkins Duane Hawkins
              Tiago Bueno Tiago Bueno
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: