Uploaded image for project: 'WildFly WIP'
  1. WildFly WIP
  2. WFWIP-17

[Artemis 2.x Upgrade] Replicated HA: Live and Backup do not form cluster and initial synchronization is not triggered

    XMLWordPrintable

Details

    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/265/artifact/jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackWithDivertsTransAckQueueKill -DfailIfNoTests=false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/265/artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackWithDivertsTransAckQueueKill -DfailIfNoTests= false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT | tee log

    Description

      Across all replicated HA tests I often see an issue that initial synchronization between Live and Backup was not triggered.

      Scenario

      1. There are two Wildfly servers configured as replicated Live-Backup pair.
      2. Live server is killed/shutdown

      Expectation: Backup server becomes active.
      Reality: Backup server does not become active, because initial synchronization with Live server was not triggered.

      Users impact: Replicated HA feature doesn't work properly.

      Blocker priority was set because this is regression against previous Wildfly releases.

      Detail description of the issue

      In the trace log there is last message from the SharedNothingBackupActivation which says that it is waiting on cluster connection. It is interested that JGroups subsystem is booting after this message was logged. However I am not sure if this could cause the issue.

      13:42:06,489 DEBUG [org.apache.activemq.artemis.core.client.impl.Topology] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) Topology@6a834ff0[owner=ServerLocatorImpl [initialConnectors=[], 
      discoveryGroupConfiguration=DiscoveryGroupConfiguration{name='dg-group1', refreshTimeout=10000, discoveryInitialWaitTimeout=10000}]] is sending topology to org.apache.activemq.artemis.core.server.impl.NamedLiveN
      odeLocatorForReplication@38150570
      13:42:06,489 TRACE [org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) Waiting on cluster connection
      13:42:06,502 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
      13:42:06,503 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000078: Starting JGroups channel ejb
      13:42:06,504 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [node-1|1] (2) [node-1, node-2]
      13:42:06,506 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000094: Received new cluster view for channel ejb: [node-1|1] (2) [node-1, node-2]
      13:42:06,506 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb
      13:42:06,506 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [node-1|1] (2) [node-1, node-2]
      13:42:06,510 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb
      13:42:06,510 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [node-1|1] (2) [node-1, node-2]
      13:42:06,516 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-8) ISPN000079: Channel ejb local address is node-2, physical addresses are [127.0.0.1:56200]
      13:42:06,527 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000079: Channel ejb local address is node-2, physical addresses are [127.0.0.1:56200]
      13:42:06,527 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000079: Channel ejb local address is node-2, physical addresses are [127.0.0.1:56200]
      13:42:06,529 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-6) ISPN000079: Channel ejb local address is node-2, physical addresses are [127.0.0.1:56200]
      13:42:06,551 INFO  [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-7) ISPN000128: Infinispan version: Infinispan 'Gaina' 9.2.0.Final
      

      Attachments

        1. server1.log
          102 kB
        2. server1-trace.log
          14.46 MB
        3. server2.log
          87 kB
        4. server2-trace.log
          1.49 MB

        Issue Links

          Activity

            People

              jmesnil1@redhat.com Jeff Mesnil
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: