Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-6129

[Operator] Unable to enable rebalancing after bringing cluster back online

XMLWordPrintable

      It's pretty common to see errors bellow after bringing clusters back online after shutdown suggesting something didn't go as it was supposed to. Nevertheless cluster is fully functional without any observable issues and re-balancing is enabled (Probably Operator re-try?). Possible race condition between the request and topologyId assignment?

      Operator log

      ERROR	controller-runtime.manager.controller.infinispan	Reconciler error	{"reconciler group": "infinispan.org", "reconciler kind": "Infinispan", "name": "infinispan", "namespace": "datagrid", "error": "unable to enable rebalancing after cluster scale up: unexpected HTTP status code (500): unexpected error during enable-rebalancing, response: Cannot invoke \"org.infinispan.topology.CacheTopology.getTopologyId()\" because \"cacheTopology\" is null"}
      github.com/go-logr/zapr.(*zapLogger).Error
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/github.com/go-logr/zapr/zapr.go:132
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:267
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198
      k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156
      k8s.io/apimachinery/pkg/util/wait.JitterUntil
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
      k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
      k8s.io/apimachinery/pkg/util/wait.UntilWithContext
      	/remote-source/deps/gomod/src/github.com/infinispan/infinispan-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99
      

      Server log

      15:17:18,741 INFO  (jgroups-6,infinispan-0-22109) [org.infinispan.CLUSTER] ISPN000094: Received new cluster view for channel infinispan: [infinispan-0-22109|1] (2) [infinispan-0-22109, infinispan-1-42889]
      15:17:18,744 INFO  (jgroups-6,infinispan-0-22109) [org.infinispan.CLUSTER] [Scope=infinispan-0-22109]ISPN100000: Node infinispan-1-42889 joined the cluster
      15:17:18,747 INFO  (jgroups-6,infinispan-0-22109) [org.infinispan.CLUSTER] [Scope=infinispan-0-22109]ISPN100000: Node infinispan-1-42889 joined the cluster
      15:17:19,021 ERROR (blocking-thread--p3-t1) [org.infinispan.rest.RestRequestHandler] ISPN012005: An error occurred while responding to the client java.util.concurrent.CompletionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from infinispan-0-22109, see cause for remote stack trace
      	at org.infinispan.util.concurrent.CompletionStages.join(CompletionStages.java:87)
      	at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:707)
      	at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:659)
      	at org.infinispan.rest.resources.ContainerResource.lambda$setRebalancing$2(ContainerResource.java:188)
      	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
      	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
      	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
      	at java.base/java.lang.Thread.run(Thread.java:833)
      Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from infinispan-0-22109, see cause for remote stack trace
      	at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:25)
      	at org.infinispan.remoting.transport.impl.VoidResponseCollector.addException(VoidResponseCollector.java:47)
      	at org.infinispan.remoting.transport.impl.VoidResponseCollector.addException(VoidResponseCollector.java:19)
      	at org.infinispan.remoting.transport.ValidResponseCollector.addResponse(ValidResponseCollector.java:29)
      	at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$2(TopologyManagementHelper.java:129)
      	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
      	at java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:950)
      	at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2340)
      	at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:144)
      	at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$3(TopologyManagementHelper.java:125)
      	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
      	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
      	at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:68)
      	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:106)
      	at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:51)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1578)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1480)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1685)
      	at org.jgroups.JChannel.up(JChannel.java:733)
      	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:921)
      	at org.jgroups.protocols.FRAG2.up(FRAG2.java:138)
      	at org.jgroups.protocols.FlowControl.up(FlowControl.java:245)
      	at org.jgroups.protocols.FlowControl.up(FlowControl.java:245)
      	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:845)
      	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:226)
      	at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1052)
      	at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:794)
      	at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:776)
      	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:425)
      	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:658)
      	at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:105)
      	at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180)
      	at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:188)
      	at org.jgroups.protocols.MERGE3.up(MERGE3.java:274)
      	at org.jgroups.protocols.Discovery.up(Discovery.java:294)
      	at org.jgroups.stack.Protocol.up(Protocol.java:314)
      	at org.jgroups.protocols.TP.passMessageUp(TP.java:1178)
      	at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:100)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	... 1 more
      Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from infinispan-0-22109, see cause for remote stack trace
      	at org.infinispan.topology.TopologyManagementHelper.makeResponse(TopologyManagementHelper.java:138)
      	at org.infinispan.topology.TopologyManagementHelper.lambda$addLocalResult$2(TopologyManagementHelper.java:126)
      	... 37 more
      Caused by: java.lang.NullPointerException: Cannot invoke "org.infinispan.topology.CacheTopology.getTopologyId()" because "cacheTopology" is null
      	at org.infinispan.commands.topology.TopologyUpdateStableCommand.<init>(TopologyUpdateStableCommand.java:44)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastStableTopologyUpdate(ClusterTopologyManagerImpl.java:643)
      	at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:835)
      	at java.base/java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4780)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:689)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:665)
      	at org.infinispan.commands.topology.RebalancePolicyUpdateCommand.invokeAsync(RebalancePolicyUpdateCommand.java:37)
      	at org.infinispan.topology.TopologyManagementHelper.invokeAsync(TopologyManagementHelper.java:151)
      	at org.infinispan.topology.TopologyManagementHelper.executeOnClusterSync(TopologyManagementHelper.java:51)
      	at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:707)
      	at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:659)
      	at org.infinispan.rest.resources.ContainerResource.lambda$setRebalancing$2(ContainerResource.java:188)
      	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
      	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
      	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
      	... 1 more
      
      15:17:19,203 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=org.infinispan.CLIENT_SERVER_TX_TABLE]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,268 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=___protobuf_metadata]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,288 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=___script_cache]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,302 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=org.infinispan.CONFIG]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,329 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=org.infinispan.COUNTER]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,354 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=org.infinispan.LOCKS]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,515 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=___hotRodTopologyCache_hotrod-default]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      15:17:19,662 INFO  (jgroups-5,infinispan-0-22109) [org.infinispan.CLUSTER] [Context=___hotRodTopologyCache_hotrod-admin]ISPN100008: Updating cache members list [infinispan-0-22109, infinispan-1-42889], topology id 1
      

            remerson@redhat.com Ryan Emerson
            pdrobek@redhat.com Pavel Drobek
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: