Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: AMQ 7.2.2.GA
Affects Version/s: None
Component/s: high-availability
Labels:
- failover
- replication

Affects:

Documentation (Ref Guide, User Guide, etc.), Release Notes
GSS Priority:
Release Note Text:

Hide
This issue occurred in past releases when multiple backup brokers, also referred to as slaves, served a single live (master) broker. In this scenario, if a primary backup broker failed, the secondary backup tried to replicate. But that operation failed, the secondary backup could not take over for the primary backup, and as a result, high-availability was lost. This issue is now resolved.

Show
This issue occurred in past releases when multiple backup brokers, also referred to as slaves, served a single live (master) broker. In this scenario, if a primary backup broker failed, the secondary backup tried to replicate. But that operation failed, the secondary backup could not take over for the primary backup, and as a result, high-availability was lost. This issue is now resolved.
Release Note Status:
Documented as Resolved Issue
Target Release:

AMQ 7.2.2.GA
Upstream Jira:
https://issues.apache.org/jira/browse/ARTEMIS-1285
Steps to Reproduce:

Hide

Configurations attached.

1. Start master node
2. Start slave node and wait for it to become replication partner for master
3. Start standby node
4. Produce messages to master node
5. Kill slave node
(standby does not announce as backup and remains in Backup Activation Loop
6. Kill master node
(standby still waiting to become backup)

Show
Configurations attached. 1. Start master node 2. Start slave node and wait for it to become replication partner for master 3. Start standby node 4. Produce messages to master node 5. Kill slave node (standby does not announce as backup and remains in Backup Activation Loop 6. Kill master node (standby still waiting to become backup)

Sprint:
AMQ Broker 1836, AMQ Broker 1839

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When testing failover in a scenario with 1 master and 2 slaves, the example scenario in which the master is killed first worrks correctly - the primary backup becomes the master and the secondary backup becomes the replication node.

If, however, the primary backup is killed first, the secondary backup remains stopped and does not announce as the replication slave. Instead it continues to log:

13:31:44,373 WARN  [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped

When the master is brought down, the secondary slave remains stopped.

Looking at the thread dumps of the secondary backup for this scenario, (taken when the primary is killed), it appears the secondary is stuck looping in NamedLiveNodeLocatorForReplication::locateNode(...).

"AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null" #18 prio=5 os_prio=0 tid=0x00007f1920803800 nid=0x642b waiting on condition [0x00007f19028e8000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c04b7170> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.apache.activemq.artemis.core.server.impl.NamedLiveNodeLocatorForReplication.locateNode(NamedLiveNodeLocatorForReplication.java:67)
        at org.apache.activemq.artemis.core.server.impl.NamedLiveNodeLocatorForReplication.locateNode(NamedLiveNodeLocatorForReplication.java:54)
        at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:195)
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2793)

   Locked ownable synchronizers:
        - None

If multiple slaves are configured for a master, nth slave should become the active slave if the current slave(s) are offline.

This is https://issues.apache.org/jira/browse/ARTEMIS-2075 upstream

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

reproducer.tar
20 kB
2017/10/21 3:14 PM

is cloned by

ENTMQBR-1954 Create test for Standby slave does not announce replication to master when primary slave is down

Closed

is duplicated by

ENTMQBR-1021 [HA, MS1S2] When backup slave1 is killed, slave2 can't take role of backup, leaving HA on master only

Closed

is related to: ARTEMIS-1285 Loading...

Assignee:: Andy Taylor

Reporter:: Duane Hawkins

Tester:: Roman Vais (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2017/10/21 3:19 PM

Updated:: 2021/10/24 6:33 AM

Resolved:: 2018/10/15 6:50 AM

Estimated:

Remaining:

Logged:

Not Specified

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Time Tracking