-
Bug
-
Resolution: Done
-
Major
-
None
-
14.0.8.Final
-
None
The operator graceful shutdown procedure used for an upgrade is as follows:
1. Disable rebalance
2. Call /container/?action=shutdown on each cluster member
3. Scale cluster down to 0 pods
4. Scale cluster back to original number of pods
5. Wait for /cache-managers/default/health to report all members
6. Enable rebalance
7. Upgrade complete
However, step 6 sometimes fails with the following server side exception and the upgrade is unable to proceed:
12:45:14,469 ERROR (blocking-thread--p3-t1) [org.infinispan.rest.RestRequestHandler] ISPN012005: An error occurred while responding to the client java.util.concurrent.CompletionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from test-upgrade-1-11326, see cause for remote stack trace at org.infinispan.util.concurrent.CompletionStages.join(CompletionStages.java:87) at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:707) at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:659) at org.infinispan.rest.resources.ContainerResource.lambda$setRebalancing$2(ContainerResource.java:188) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from test-upgrade-1-11326, see cause for remote stack trace at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:25) at org.infinispan.remoting.transport.impl.VoidResponseCollector.addException(VoidResponseCollector.java:47) at org.infinispan.remoting.transport.impl.VoidResponseCollector.addException(VoidResponseCollector.java:19) at org.infinispan.remoting.transport.ValidResponseCollector.addResponse(ValidResponseCollector.java:29) at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$2(TopologyManagementHelper.java:129) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) at java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:950) at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2340) at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:144) at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$3(TopologyManagementHelper.java:125) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:68) at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:106) at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:51) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1578) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1480) at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1685) at org.jgroups.JChannel.up(JChannel.java:733) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:921) at org.jgroups.protocols.FRAG2.up(FRAG2.java:138) at org.jgroups.protocols.FlowControl.up(FlowControl.java:245) at org.jgroups.protocols.FlowControl.up(FlowControl.java:245) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:845) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:226) at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1052) at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:794) at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:776) at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:425) at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:658) at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:105) at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180) at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:188) at org.jgroups.protocols.MERGE3.up(MERGE3.java:274) at org.jgroups.protocols.Discovery.up(Discovery.java:294) at org.jgroups.stack.Protocol.up(Protocol.java:314) at org.jgroups.protocols.TP.passMessageUp(TP.java:1178) at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:100) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ... 1 more Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from test-upgrade-1-11326, see cause for remote stack trace at org.infinispan.topology.TopologyManagementHelper.makeResponse(TopologyManagementHelper.java:138) at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$2(TopologyManagementHelper.java:126) ... 37 more Caused by: java.lang.NullPointerException: Cannot invoke "org.infinispan.topology.CacheTopology.getTopologyId()" because "cacheTopology" is null at org.infinispan.commands.topology.TopologyUpdateStableCommand.<init>(TopologyUpdateStableCommand.java:44) at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastStableTopologyUpdate(ClusterTopologyManagerImpl.java:643) at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:835) at java.base/java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4780) at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:689) at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:665) at org.infinispan.commands.topology.RebalancePolicyUpdateCommand.invokeAsync(RebalancePolicyUpdateCommand.java:37) at org.infinispan.topology.TopologyManagementHelper.invokeAsync(TopologyManagementHelper.java:151) at org.infinispan.topology.TopologyManagementHelper.executeOnClusterSync(TopologyManagementHelper.java:51) at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:707) at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:659) at org.infinispan.rest.resources.ContainerResource.lambda$setRebalancing$2(ContainerResource.java:188) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377) ... 1 more
- causes
-
JDG-6129 [Operator] Unable to enable rebalancing after bringing cluster back online
- Verified