Loading...

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 7.1.0.CR1
Affects Version/s: 7.1.0.ER3
Component/s: ActiveMQ
Labels:
None

Affects Testing:

Regression, Blocks Testing
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Steps to Reproduce:
Hide

git clone https://github.com/rh-messaging/jboss-activemq-artemis cd jboss-activemq-artemis git checkout 1.5.5.jbossorg-006 mvn install -Ptests -Dtest=ReplicatedFailoverTest#testTimeoutOnFailover -Drat.ignoreErrors=true -DfailIfNoTests=false | tee log
Show
git clone https: //github.com/rh-messaging/jboss-activemq-artemis cd jboss-activemq-artemis git checkout 1.5.5.jbossorg-006 mvn install -Ptests -Dtest=ReplicatedFailoverTest#testTimeoutOnFailover -Drat.ignoreErrors= true -DfailIfNoTests= false | tee log

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Scenario: The issue occurs in all replication scenarios during initial synchronization.
Customer impact: Initial replication between live and backup may fail and hence the replication won't work.

We see this issue only in Artemis upstream test suite. We haven't seen it in EAP tests.
Although EAP failover tests didn't hit this issue, there is still a risk that the issue may arise in the production so the blocker priority was set.

This is regression against 7.0.z.

Detail description of the issue
The following NullPointerException arises in almost all replication tests in upstream Artemis test suite.

*** [Thread-1 (org.apache.activemq.artemis.utils.ActiveMQThreadFactory)] ***
08:11:01,702 WARN  [org.apache.activemq.artemis.core.replication.ReplicationEndpoint] null: java.lang.NullPointerException
	at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handleReplicationSynchronization(ReplicationEndpoint.java:444) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:196) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:633) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:379) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:362) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1143) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:196) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:118) [artemis-commons-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) [rt.jar:1.8.0]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [rt.jar:1.8.0]
	at java.lang.Thread.run(Thread.java:785) [vm.jar:2.6 (05-16-2017)]

I found out that the issue is caused by incorrect ordering of replication packets. The NPE arises when ReplicationSyncFileMessage packets are sent before ReplicationStartSyncMessage packets.

Incorrect ordering of replication packets may happen because of useExecutor parameter in the sendReplicatePacket method. ReplicationStartSyncMessage packets are sent as first, but they are sent with useExecutor=true. Although ReplicationSyncFileMessage packets are sent after ReplicationStartSyncMessage packets, they are sent with useExecutor=false. So sending of ReplicationStartSyncMessage packets is scheduled to executor and there is no guarantee when the task will be executed, whereas ReplicationStartSyncMessage packets are sent immediately.

private OperationContext sendReplicatePacket(final Packet packet, boolean lineUp, boolean useExecutor) {
      if (!enabled)
         return null;
      boolean runItNow = false;

      final OperationContext repliToken = OperationContextImpl.getContext(executorFactory);
      if (lineUp) {
         repliToken.replicationLineUp();
      }

      if (enabled) {
         if (useExecutor) {
            replicationStream.execute(() -> {
               if (enabled) {
                  pendingTokens.add(repliToken);
                  flowControl(packet.expectedEncodeSize());
                  replicatingChannel.send(packet);
               }
            });
         } else {
            pendingTokens.add(repliToken);
            flowControl(packet.expectedEncodeSize());
            replicatingChannel.send(packet);
         }
      } else {
         // Already replicating channel failed, so just play the action now
         runItNow = true;
      }

      // Execute outside lock

      if (runItNow) {
         repliToken.replicationDone();
      }

      return repliToken;
   }

is incorporated by

JBEAP-12695 Upgrade Artemis 1.5.5.jbossorg-007

Closed

is related to: ARTEMIS-1353 Loading...

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates