Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2377

Slave, which was activated by split brain, does not deactivate even after network recovery to master

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • None
    • None
      • Restart master explicitly after the split brain.
    • Hide
      1. Configure HA using replication
      2. Start master
      3. Start slave
      4. Disconnect network between master and slave for 60 seconds (using `brctl` command)
      5. => Slave will be active as the backup server. And both master and slave will be active because of the split brain. This itself is no problem.
      6. Connect network between master and slave.
      7. => Slave continue to be active. This is a problem.
      Show
      Configure HA using replication Start master Start slave Disconnect network between master and slave for 60 seconds (using `brctl` command) => Slave will be active as the backup server. And both master and slave will be active because of the split brain. This itself is no problem. Connect network between master and slave. => Slave continue to be active. This is a problem.

    Description

      • In HA using replication, slave, which was activated by the split brain, does not deactivate even after network recovery to the master
      • if restart master after the split brain, the slave is deactivated.
      • master log
        ```
        2019-03-22 11:47:24,925 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ229014: Did not receive data from /192.168.122.109:35604 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
        2019-03-22 11:47:28,921 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ229014: Did not receive data from /192.168.122.109:35606 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
        2019-03-22 11:47:28,923 WARN  [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ229014: Did not receive data from /XXX.XXX.XXX.XXX:XXXXX within the 60,000ms connection TTL. The connection will now be closed.]
        	at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread$2.run(RemotingServiceImpl.java:735) [artemis-server-2.6.3.redhat-00020.jar:2.6.3.redhat-00020]
        	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.6.3.redhat-00020.jar:2.6.3.redhat-00020]
        	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.6.3.redhat-00020.jar:2.6.3.redhat-00020]
        	at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.6.3.redhat-00020.jar:2.6.3.redhat-00020]
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
        	at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.6.3.redhat-00020.jar:2.6.3.redhat-00020]
        
        2019-03-22 11:47:28,930 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ229014: Did not receive data from /192.168.122.109:35612 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
        2019-03-22 11:48:22,340 FINE  [org.jgroups.protocols.UNICAST3] node2-29598: closing expired connection for node1-20633 (120165 ms old) in send_table
        2019-03-22 11:48:22,342 FINE  [org.jgroups.protocols.UNICAST3] node2-29598: closing expired connection for node1-20633 (120165 ms old) in recv_table
        2019-03-22 11:48:32,350 FINE  [org.jgroups.protocols.UNICAST3] node2-29598: removing expired connection for node1-20633 (10006 ms old) from send_table
        ```
        
      • slave log
        ```
        2019-03-22 11:47:56,099 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
        2019-03-22 11:47:56,099 INFO  [org.apache.activemq.artemis.core.server] AMQ221067: Waiting 30 seconds for quorum vote results.
        2019-03-22 11:47:56,100 INFO  [org.apache.activemq.artemis.core.server] AMQ221068: Received all quorum votes.
        2019-03-22 11:47:56,100 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
        2019-03-22 11:47:56,110 INFO  [org.apache.activemq.artemis.core.server] AMQ221037: ActiveMQServerImpl::serverUUID=ed893f0a-4bfb-11e9-a14a-5254005af083 to become 'live'
        2019-03-22 11:47:56,278 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address DLQ supporting [ANYCAST]
        2019-03-22 11:47:56,278 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue DLQ on address DLQ
        2019-03-22 11:47:56,279 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address ExpiryQueue supporting [ANYCAST]
        2019-03-22 11:47:56,279 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue ExpiryQueue on address ExpiryQueue
        2019-03-22 11:47:56,339 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
        2019-03-22 11:47:56,360 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 192.168.122.109:61716 for protocols [CORE]
        2019-03-22 11:48:22,190 FINE  [org.jgroups.protocols.UNICAST3] node1-20633: closing expired connection for node2-29598 (120275 ms old) in send_table
        2019-03-22 11:48:22,191 FINE  [org.jgroups.protocols.UNICAST3] node1-20633: closing expired connection for node2-29598 (120275 ms old) in recv_table
        2019-03-22 11:48:32,211 FINE  [org.jgroups.protocols.UNICAST3] node1-20633: removing expired connection for node2-29598 (10013 ms old) from send_table
        2019-03-22 11:48:32,212 FINE  [org.jgroups.protocols.UNICAST3] node1-20633: removing expired connection for node2-29598 (10013 ms old) from recv_table
        ```
        
        

      Attachments

        Issue Links

          Activity

            People

              rh-ee-ataylor Andy Taylor
              rhn-support-tyamashi Tomonari Yamashita
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: