Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-9597

Race Condition with Cluster Connections when Restarting Broker

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • AMQ 7.12.3.GA
    • clustering
    • None
    • False
    • None
    • False
    • Hide

      Reproducer configuration attached.

      1. Configure and start both brokers
      2. After verifying all 40 cluster connections are up, stop one broker and restart immediately
      3. Wait for a few minutes
      4. Observe the cluster connection count on each node

      Typically the connections on the node that was not restarted do not get fully restored (note: it may take a few tries).

      Oddly, this doesn't seem to be reproducible when the cluster connections are all defined in the XML, but it is relatively easy to reproduce when they are defined via properties.

      Show
      Reproducer configuration attached. 1. Configure and start both brokers 2. After verifying all 40 cluster connections are up, stop one broker and restart immediately 3. Wait for a few minutes 4. Observe the cluster connection count on each node Typically the connections on the node that was not restarted do not get fully restored (note: it may take a few tries). Oddly, this doesn't seem to be reproducible when the cluster connections are all defined in the XML, but it is relatively easy to reproduce when they are defined via properties.
    • Important

      When restarting a broker with multiple cluster connections (each cluster connection tied to a specific port / acceptor) there is a race condition that can occur if one of the brokers is stopped and quickly restarted. If the broker is restarted immediately, we see all or most of the cluster connections fail to reconnect from the opposite broker. If we wait a few seconds between stopping and restarting the broker, we see a smaller number of missing connections. If we wait a minute or so after stopping the broker to restart it, all of the connections seem to be restored.

      In the log of the broker that remains up, we see messages like this when the connectors are not restored:

      2024-12-18 17:10:34,965 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,965 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,966 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,967 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,968 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,969 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,969 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      2024-12-18 17:10:34,970 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
      

              Unassigned Unassigned
              rhn-support-dhawkins Duane Hawkins
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: