Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3763

Errors when scaling-down ZooKeeper nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.1.0.GA
    • None
    • None
    • None

      When scaling down ZooKeeper cluster with PodSets enabled, it seems to sometimes happen that the pods still exist when the Zoo RU is called as the next step in the reconciliation and they cause error in the RU:

      2022-02-02 23:31:23 INFO  OperatorWatcher:38 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED
      2022-02-02 23:31:23 INFO  AbstractOperator:226 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
      2022-02-02 23:31:23 INFO  KafkaAssemblyOperator:1574 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Scaling Zookeeper down from 5 to 3 replicas
      2022-02-02 23:31:24 INFO  StrimziPodSetController:228 - Reconciliation #144(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:24 INFO  StrimziPodSetController:262 - Reconciliation #144(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:24 INFO  StrimziPodSetController:228 - Reconciliation #145(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:24 INFO  StrimziPodSetController:262 - Reconciliation #145(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:24 INFO  StrimziPodSetController:228 - Reconciliation #146(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:24 INFO  StrimziPodSetController:262 - Reconciliation #146(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:25 INFO  StrimziPodSetController:228 - Reconciliation #147(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:25 INFO  StrimziPodSetController:262 - Reconciliation #147(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:25 INFO  StrimziPodSetController:228 - Reconciliation #148(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:25 ERROR AbstractOperator:247 - Reconciliation #143(watch) Kafka(myproject/my-cluster): createOrUpdate failed
      java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet
          at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      2022-02-02 23:31:25 INFO  StrimziPodSetController:262 - Reconciliation #148(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:25 INFO  StrimziPodSetController:228 - Reconciliation #149(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:25 INFO  StrimziPodSetController:262 - Reconciliation #149(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:25 INFO  CrdOperator:113 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Status of Kafka my-cluster in namespace myproject has been updated
      2022-02-02 23:31:26 INFO  OperatorWatcher:38 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED
      2022-02-02 23:31:26 WARN  AbstractOperator:532 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Failed to reconcile
      java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet
          at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      2022-02-02 23:31:26 INFO  AbstractOperator:226 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
      2022-02-02 23:31:26 INFO  StrimziPodSetController:228 - Reconciliation #151(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:26 ERROR AbstractOperator:247 - Reconciliation #150(watch) Kafka(myproject/my-cluster): createOrUpdate failed
      java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet
          at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      2022-02-02 23:31:26 INFO  StrimziPodSetController:262 - Reconciliation #151(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:26 INFO  StrimziPodSetController:228 - Reconciliation #152(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:26 WARN  AbstractOperator:532 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Failed to reconcile
      java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet
          at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
          at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
          at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      2022-02-02 23:31:26 INFO  StrimziPodSetController:262 - Reconciliation #152(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:26 INFO  StrimziPodSetController:228 - Reconciliation #153(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:26 INFO  StrimziPodSetController:262 - Reconciliation #153(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:26 INFO  StrimziPodSetController:228 - Reconciliation #154(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:26 INFO  StrimziPodSetController:262 - Reconciliation #154(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:228 - Reconciliation #155(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:262 - Reconciliation #155(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:228 - Reconciliation #156(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:262 - Reconciliation #156(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:228 - Reconciliation #157(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled
      2022-02-02 23:31:27 INFO  StrimziPodSetController:262 - Reconciliation #157(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled
      2022-02-02 23:31:40 INFO  ClusterOperator:123 - Triggering periodic reconciliation for namespace myproject
      2022-02-02 23:31:40 INFO  AbstractOperator:226 - Reconciliation #158(timer) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
      2022-02-02 23:31:42 INFO  CrdOperator:113 - Reconciliation #158(timer) Kafka(myproject/my-cluster): Status of Kafka my-cluster in namespace myproject has been updated
      2022-02-02 23:31:42 INFO  OperatorWatcher:38 - Reconciliation #159(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED
      2022-02-02 23:31:42 INFO  AbstractOperator:517 - Reconciliation #158(timer) Kafka(myproject/my-cluster): reconciled
      2022-02-02 23:31:42 INFO  AbstractOperator:226 - Reconciliation #159(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
      2022-02-02 23:31:44 INFO  AbstractOperator:517 - Reconciliation #159(watch) Kafka(myproject/my-cluster): reconciled 

      The issue seems to eventually resolve it self when the pods are actually deleted. But this is not exactly nice behaviour and it should be investigated.

      It looks like the issue here is that the scale-down is not actually waiting for the pods to be deleted (the deletion happens asynchronously by the StrimziPodSetController). Waiting for their deletion should make sure they do not leak into the Zoo RU and these errors do not happen again.

       

              Unassigned Unassigned
              scholzj JAkub Scholz
              Lukas Kral Lukas Kral
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: