Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-9322

Document delay on delivering messages due to IO asynchronous nature versus retry ACKs on the Mirror

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • AMQ 7.12.1.GA
    • disaster-recovery
    • None
    • False
    • None
    • False
    • Important

      Customer has two sites, which I will call 'primary' and 'secondary'. Each site has an active and a passive broker. There are asynchronous mirror connections between the sites, in both directions. On one site handles clients at any given time.

      We observe an apparent duplication of messages on the primary active broker, while testing failover between sites.

      1. Clients are producing and consuming from the primary site, active broker. The total message backlog on this broker is approximately zero. The application queues are consumed as quickly as they are produced, and the mirror queue is transferred to the second site, active broker, with no delays. There are no clients on the secondary site.

      2. The primary site is shut down in an orderly way, passive first, then active. Clients continue to produce and consume.

      3. Clients transfer to the secondary site, active broker. They continue to produce and consume. The application queues show approximately zero backlog, because production and consumption is at the same rate. However, the mirror queue now shows an increasing backlog, because the mirror connection to the (currently down) primary site is inactive.

      4. Clear all data from the brokers in the primary site. What's being simulated here is the primary site being restored in a clean state after a catastrophic failure. Note that there are no clients of the primary-site brokers, since there is no reason for them to move from the secondary site.

      5. We expect to see zero messages on the primary site active broker because, although it will receive the huge backlog of messages from the secondary site's mirror queue, there should be the same number of message-add and message-remove operations. The secondary site has no backlog in any application queue, so it must have recorded the same number of message adds as message removes.

      6. Instead, we see some messages in the application queues on the primary site. These messages must have come over the mirror connection from the secondary site, because the primary site started clean. It appears that more message-add operations have been mirrored than message-remove operations.

              Unassigned Unassigned
              rhn-support-kboone Kevin Boone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: