[ENTMQBR-9597] Race Condition with Cluster Connections when Restarting Broker - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: AMQ 7.12.3.GA
Component/s: clustering
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Steps to Reproduce:

Hide

Reproducer configuration attached.

1. Configure and start both brokers
2. After verifying all 40 cluster connections are up, stop one broker and restart immediately
3. Wait for a few minutes
4. Observe the cluster connection count on each node

Typically the connections on the node that was not restarted do not get fully restored (note: it may take a few tries).

Oddly, this doesn't seem to be reproducible when the cluster connections are all defined in the XML, but it is relatively easy to reproduce when they are defined via properties.

Show
Reproducer configuration attached. 1. Configure and start both brokers 2. After verifying all 40 cluster connections are up, stop one broker and restart immediately 3. Wait for a few minutes 4. Observe the cluster connection count on each node Typically the connections on the node that was not restarted do not get fully restored (note: it may take a few tries). Oddly, this doesn't seem to be reproducible when the cluster connections are all defined in the XML, but it is relatively easy to reproduce when they are defined via properties.
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When restarting a broker with multiple cluster connections (each cluster connection tied to a specific port / acceptor) there is a race condition that can occur if one of the brokers is stopped and quickly restarted. If the broker is restarted immediately, we see all or most of the cluster connections fail to reconnect from the opposite broker. If we wait a few seconds between stopping and restarting the broker, we see a smaller number of missing connections. If we wait a minute or so after stopping the broker to restart it, all of the connections seem to be restored.

In the log of the broker that remains up, we see messages like this when the connectors are not restored:

2024-12-18 17:10:34,965 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,965 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,966 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,967 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,968 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,969 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,969 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge
2024-12-18 17:10:34,970 WARN  [org.apache.activemq.artemis.core.server] AMQ222100: ServerLocator was shutdown, can not retry on opening connection for bridge

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

20_cluster.tar
80 kB
2025/03/12 1:03 PM
reproducer.zip
9 kB
2024/12/19 2:30 PM
reproducer2.zip
9 kB
2025/03/12 5:57 PM
Screenshot From 2025-03-13 10-56-32.png
193 kB
2025/03/13 10:57 AM

relates to

ENTMQBR-9263 Cluster Bridges Fail to Connect on Broker Cluster with Multiple Cluster Connections Per Cluster Pair

Backlog

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide