Scenario: The issue occurs in all replication scenarios during initial synchronization.
Customer impact: Initial replication between live and backup may fail and hence the replication won't work.
We see this issue only in Artemis upstream test suite. We haven't seen it in EAP tests.
Although EAP failover tests didn't hit this issue, there is still a risk that the issue may arise in the production so the blocker priority was set.
This is regression against 7.0.z.
Detail description of the issue
The following NullPointerException arises in almost all replication tests in upstream Artemis test suite.
*** [Thread-1 (org.apache.activemq.artemis.utils.ActiveMQThreadFactory)] *** 08:11:01,702 WARN [org.apache.activemq.artemis.core.replication.ReplicationEndpoint] null: java.lang.NullPointerException at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handleReplicationSynchronization(ReplicationEndpoint.java:444) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:196) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:633) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:379) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:362) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1143) [artemis-core-client-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:196) [artemis-server-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:118) [artemis-commons-1.5.5.jbossorg-006.jar:1.5.5.jbossorg-006] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) [rt.jar:1.8.0] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [rt.jar:1.8.0] at java.lang.Thread.run(Thread.java:785) [vm.jar:2.6 (05-16-2017)]
I found out that the issue is caused by incorrect ordering of replication packets. The NPE arises when ReplicationSyncFileMessage packets are sent before ReplicationStartSyncMessage packets.
Incorrect ordering of replication packets may happen because of useExecutor parameter in the sendReplicatePacket method. ReplicationStartSyncMessage packets are sent as first, but they are sent with useExecutor=true. Although ReplicationSyncFileMessage packets are sent after ReplicationStartSyncMessage packets, they are sent with useExecutor=false. So sending of ReplicationStartSyncMessage packets is scheduled to executor and there is no guarantee when the task will be executed, whereas ReplicationStartSyncMessage packets are sent immediately.
private OperationContext sendReplicatePacket(final Packet packet, boolean lineUp, boolean useExecutor) { if (!enabled) return null; boolean runItNow = false; final OperationContext repliToken = OperationContextImpl.getContext(executorFactory); if (lineUp) { repliToken.replicationLineUp(); } if (enabled) { if (useExecutor) { replicationStream.execute(() -> { if (enabled) { pendingTokens.add(repliToken); flowControl(packet.expectedEncodeSize()); replicatingChannel.send(packet); } }); } else { pendingTokens.add(repliToken); flowControl(packet.expectedEncodeSize()); replicatingChannel.send(packet); } } else { // Already replicating channel failed, so just play the action now runItNow = true; } // Execute outside lock if (runItNow) { repliToken.replicationDone(); } return repliToken; }
- is incorporated by
-
JBEAP-12695 Upgrade Artemis 1.5.5.jbossorg-007
- Closed
- is related to
-
ARTEMIS-1353 Loading...