Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-5979

Colocated Artemis backup does not failback

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 10.0.0.CR5
    • JMS
    • None

      Use case:

      • starts 2 WildFly servers with the standalone-full-ha.xml configuration and additional ha-policy for the messaging-activemq server:
                      <replication-colocated request-backup="true">
                          <master check-for-live-server="true"/>
                      </replication-colocated>
      

      The default configuration for the colocated's slave is (allow-failback=true, restart-backup=true).

      Scenario:
      1. Starts the 2 servers
      2. Kill server #1
      => server #2 must activate its backup
      3. Restart server #1
      => server #1 checks for a live server
      => server #2 must failback and restart the server #1's backup
      => server #1 is the live server

      Currently at step (3), the activated server on #2 does not failback, the server #1 is started as live server and both uses the same nodeID.

      * start server #1
      
      14:54:12,151 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 71) AMQ221007: Server is now live
      14:54:12,151 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 71) AMQ221001: Apache ActiveMQ Artemis Message Broker version 1.1.0.wildfly-010 [nodeID=cd605092-b933-11e5-ba21-cb5e13c1ea67]
      
      * Server #2
      
      14:56:27,926 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 71) AMQ221007: Server is now live
      14:56:27,927 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 71) AMQ221001: Apache ActiveMQ Artemis Message Broker version 1.1.0.wildfly-010 [nodeID=198586c5-b933-11e5-a199-4dbe14260a82]
      ...
      14:56:32,245 INFO  [org.apache.activemq.artemis.core.server] (default I/O-5) AMQ221049: Activating Replica for node: cd605092-b933-11e5-ba21-cb5e13c1ea67
      
      * Server #1 also creates a replica for server #2
      
      14:56:32,969 INFO  [org.apache.activemq.artemis.core.server] (default I/O-13) AMQ221049: Activating Replica for node: 198586c5-b933-11e5-a199-4dbe14260a82
      ...
      14:56:32,738 INFO  [org.apache.activemq.artemis.core.server] (Thread-7 (ActiveMQ-client-netty-threads-1157501840)) AMQ221024: Backup server ActiveMQServerImpl::
      serverUUID=cd605092-b933-11e5-ba21-cb5e13c1ea67 is synchronized with live-server.
      
      * Kill server #1 -> colocated backup on server #2 becomes live
      
      14:57:21,718 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221037: ActiveMQServerImpl::serverUUID=cd605092-b933-11e5-ba21-cb5e13c1ea67 to become 'live'
      
      * Restart server #1
      
      15:12:05,755 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 71) AMQ221001: Apache ActiveMQ Artemis Message Broker version 1.1.0.wildfly-010 [nodeID=cd605092-b933-11e5-ba21-cb5e13c1ea67]
      
      * At this stage both server #1 and #2 behave like live servers for cd605092
      
      4:58:09,893 WARN  [org.apache.activemq.artemis.core.client] (activemq-discovery-group-thread-dg-group1) AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have
      a backup node active at the same time as its live node. nodeID=cd605092-b933-11e5-ba21-cb5e13c1ea67
      

              jmesnil1@redhat.com Jeff Mesnil
              jmesnil1@redhat.com Jeff Mesnil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: