Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2678

After isolated master is live again it is unable to connect to the cluster

    XMLWordPrintable

Details

    • Hide
      • Deploy HA 3 master slaves
      • Isolate 1 master (firewall rules)
      • Make sure slave takes control and master goes down
      • disable all firewall rules (restore connection)
      • observe that master is unable to join the cluster
      Show
      Deploy HA 3 master slaves Isolate 1 master (firewall rules) Make sure slave takes control and master goes down disable all firewall rules (restore connection) observe that master is unable to join the cluster
    • Hide
      In a cluster of three or more live-backup groups that is using the replication high availability (HA) policy, the live broker shuts down when its replication connection fails. However, when the replication connection is restored and the original live broker is restarted, the broker is sometimes unable to rejoin the broker cluster. To enable the original live broker to rejoin the cluster, first stop the new live (original backup) broker, restart the original live broker, and then restart the original backup broker.
      Show
      In a cluster of three or more live-backup groups that is using the replication high availability (HA) policy, the live broker shuts down when its replication connection fails. However, when the replication connection is restored and the original live broker is restarted, the broker is sometimes unable to rejoin the broker cluster. To enable the original live broker to rejoin the cluster, first stop the new live (original backup) broker, restart the original live broker, and then restart the original backup broker.
    • Documented as Known Issue

    Description

      Once isolated broker is ressurected it can't join the original cluster, thus it has overall no effect that it went down. Seems like whole cluster of brokers needs to be restarted.

      There is a usual message on all brokers like

      2019-07-12 13:19:35,135 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=fae12f12-a493-11e9-89d6-fa163ec19b2d
      

      Attachments

        1. alive_masters_topology_view.png
          23 kB
          Michal Toth
        2. masterB_topology_view.png
          27 kB
          Michal Toth

        Issue Links

          Activity

            People

              ataylor@redhat.com Andy Taylor
              mtoth@redhat.com Michal Toth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: