Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-3131

Topology Fails to Update correctly for Backup Brokers when Master is Killed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • AMQ 7.4.0.GA, AMQ 7.5.0.GA, AMQ 7.4.1.GA
    • broker-core, clustering
    • None
    • Hide
      When a live broker fails in a cluster with more than four live-backup pairs, the live brokers, including the newly-elected live broker, all correctly report the updated topology. However, the remaining backup brokers might show the wrong topology in the following ways:

      * If a backup broker has failed over in place of the failed live broker, the remaining backup brokers show this newly-elected live broker twice in the topology.
      * If a backup broker has not yet failed over in place of the failed live broker, the remaining backup brokers still show the failed live broker in the topology.

      To work around this issue, ensure that the first `connector-ref` element in the `cluster-connection > static-connectors` configuration of each backup broker specifies the expected live broker.
      Show
      When a live broker fails in a cluster with more than four live-backup pairs, the live brokers, including the newly-elected live broker, all correctly report the updated topology. However, the remaining backup brokers might show the wrong topology in the following ways: * If a backup broker has failed over in place of the failed live broker, the remaining backup brokers show this newly-elected live broker twice in the topology. * If a backup broker has not yet failed over in place of the failed live broker, the remaining backup brokers still show the failed live broker in the topology. To work around this issue, ensure that the first `connector-ref` element in the `cluster-connection > static-connectors` configuration of each backup broker specifies the expected live broker.
    • Documented as Known Issue
    • Hide

      More details forthcoming but the issue can be reproduced by setting up 3 masters and 2 slaves on host A and 3 corresponding slaves and 2 masters on host B, with static cluster configuration.

      Start all 10 brokers and run the topology query against each:

      curl -u admin:admin http://node1.redhat.com:8161/console/jolokia/read/org.apache.activemq.artemis:broker=%221a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node1.redhat.com:8171/console/jolokia/read/org.apache.activemq.artemis:broker=%222a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node1.redhat.com:8181/console/jolokia/read/org.apache.activemq.artemis:broker=%223b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node1.redhat.com:8191/console/jolokia/read/org.apache.activemq.artemis:broker=%224b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node1.redhat.com:8201/console/jolokia/read/org.apache.activemq.artemis:broker=%225w%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node2.redhat.com:8161/console/jolokia/read/org.apache.activemq.artemis:broker=%221b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node2.redhat.com:8171/console/jolokia/read/org.apache.activemq.artemis:broker=%222b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node2.redhat.com:8181/console/jolokia/read/org.apache.activemq.artemis:broker=%223a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node2.redhat.com:8191/console/jolokia/read/org.apache.activemq.artemis:broker=%224a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology;
      curl -u admin:admin http://node2.redhat.com:8201/console/jolokia/read/org.apache.activemq.artemis:broker=%225s%22,component=cluster-connections,name=%22prod1-cluster%22/Topology
      

      Verify that each node reports the correct topology (10 nodes / 5 members)

      Now kill one of the live brokers.

      Re-run the queries

      Observe that the remaining backup brokers still report 10 nodes / 5 members, vs. the correct 9 nodes / 5 members.

      Show
      More details forthcoming but the issue can be reproduced by setting up 3 masters and 2 slaves on host A and 3 corresponding slaves and 2 masters on host B, with static cluster configuration. Start all 10 brokers and run the topology query against each: curl -u admin:admin http: //node1.redhat.com:8161/console/jolokia/read/org.apache.activemq.artemis:broker=%221a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node1.redhat.com:8171/console/jolokia/read/org.apache.activemq.artemis:broker=%222a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node1.redhat.com:8181/console/jolokia/read/org.apache.activemq.artemis:broker=%223b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node1.redhat.com:8191/console/jolokia/read/org.apache.activemq.artemis:broker=%224b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node1.redhat.com:8201/console/jolokia/read/org.apache.activemq.artemis:broker=%225w%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node2.redhat.com:8161/console/jolokia/read/org.apache.activemq.artemis:broker=%221b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node2.redhat.com:8171/console/jolokia/read/org.apache.activemq.artemis:broker=%222b%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node2.redhat.com:8181/console/jolokia/read/org.apache.activemq.artemis:broker=%223a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node2.redhat.com:8191/console/jolokia/read/org.apache.activemq.artemis:broker=%224a%22,component=cluster-connections,name=%22prod1-cluster%22/Topology; curl -u admin:admin http: //node2.redhat.com:8201/console/jolokia/read/org.apache.activemq.artemis:broker=%225s%22,component=cluster-connections,name=%22prod1-cluster%22/Topology Verify that each node reports the correct topology (10 nodes / 5 members) Now kill one of the live brokers. Re-run the queries Observe that the remaining backup brokers still report 10 nodes / 5 members, vs. the correct 9 nodes / 5 members.

      When exercising failovers due to simulated crashes (using kill -9 to kill one of the master brokers) in clustered scenarios with greater than 4 master/slave pairs, the live brokers, including the newly elected live broker, all correctly report the updated topology, while the remaining backups still report the original topology.

      Consider the scenario involving 5 master/slave broker pairs. Upon initial startup all 10 brokers report a topology of 10 nodes and 5 quorum members.

      After killing one of the master brokers, the corresponding backup is correctly elected and becomes the live node, for a topology of 9 nodes and 5 quorum members.

      Each of the live brokers in this scenario correctly report 9 nodes with 5 members.

      However, each of the remaining backups report 10 nodes with 5 members.

        1. broker1a.xml
          13 kB
        2. broker1b.xml
          13 kB

              fnigro Francesco Nigro
              rhn-support-dhawkins Duane Hawkins
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: