Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-4736

(7.0.z) Live does not become active after failback in replicated topology with http connectors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Critical Critical
    • None
    • 7.0.1.CR1, 7.0.9.CR2
    • ActiveMQ
    • None
    • Workaround Exists
    • Hide

      Use Netty connectors/acceptors.

      Show
      Use Netty connectors/acceptors.
    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout refactoring_modules
      groovy -DEAP_ZIP_URL=http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap7-artemis-prepare/lastSuccessfulBuild/artifact/jboss-eap-7.x.patched.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackTransAckTopic -DfailIfNoTests=false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.x-SNAPSHOT | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_ZIP_URL=http: //jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap7-artemis-prepare/lastSuccessfulBuild/artifact/jboss-eap-7.x.patched.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackTransAckTopic -DfailIfNoTests= false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.x-SNAPSHOT | tee log

      Scenario:

      • We have two servers Live and Backup configured in replicated topology with http connectors
      • Shutdown/Kill Live server
      • Start Live server

      Sometimes happens that Live does not become active after the failback. In the log of Live [1] I can see that server was synchronized with Backup and it announced that it is (temporary) backup. However the Backup did not receive response on SynchronizationDone packet and it did not restart, see [2]. In the trace logs I see that Live sent the response but the Backup did not receive it.

      Maybe the issue was already hit in JBEAP-3998, see comment

      [1]

      Live
      16:04:17,019 INFO  [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) AMQ221024: Backup server ActiveMQServerImpl::serverUUID=7289474b-234a-11e6-916a-177a69616978 is synchronized with live-server.
      16:04:45,925 INFO  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@167daaf-1147487463)) AMQ221031: backup announced
      
      Live trace log
      16:04:17,019 INFO  [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) AMQ221024: Backup server ActiveMQServerImpl::serverUUID=7289474b-234a-11e6-916a-177a69616978 is synchronized with live-server.
      16:04:17,019 TRACE [org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@167daaf-1147487463)) org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper@4a2d3ef6{refCount=3, channel=org.jgroups.fork.ForkChannel@3402946b, channelName='activemq-cluster', connected=true}::RefCount++ = 3 on channel activemq-cluster
      16:04:17,019 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Thread-3 (ActiveMQ-client-netty-threads-2042381447)) Sending packet nonblocking PACKET(ReplicationResponseMessageV2)[type=-9, channelID=2, packetObject=ReplicationResponseMessageV2, synchronizationIsFinishedAcknowledgement=true] on channeID=2
      

      [2]

      Backup
      16:04:45,914 WARN  [org.apache.activemq.artemis.core.server] (Thread-127) AMQ222013: Error when trying to start replication: java.lang.IllegalStateException: AMQ119114: Replication synchronization process timed out after waiting 30 000 milliseconds
              at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:596)
              at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:392)
              at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:163)
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91]
      

        1. live.zip
          529 kB
        2. backup.zip
          5.10 MB

              rhn-cservice-bbaranow Bartosz Baranowski
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: