-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
False
-
False
-
When scaling down ZooKeeper cluster with PodSets enabled, it seems to sometimes happen that the pods still exist when the Zoo RU is called as the next step in the reconciliation and they cause error in the RU:
2022-02-02 23:31:23 INFO OperatorWatcher:38 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED 2022-02-02 23:31:23 INFO AbstractOperator:226 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification 2022-02-02 23:31:23 INFO KafkaAssemblyOperator:1574 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Scaling Zookeeper down from 5 to 3 replicas 2022-02-02 23:31:24 INFO StrimziPodSetController:228 - Reconciliation #144(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:24 INFO StrimziPodSetController:262 - Reconciliation #144(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:24 INFO StrimziPodSetController:228 - Reconciliation #145(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:24 INFO StrimziPodSetController:262 - Reconciliation #145(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:24 INFO StrimziPodSetController:228 - Reconciliation #146(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:24 INFO StrimziPodSetController:262 - Reconciliation #146(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:25 INFO StrimziPodSetController:228 - Reconciliation #147(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:25 INFO StrimziPodSetController:262 - Reconciliation #147(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:25 INFO StrimziPodSetController:228 - Reconciliation #148(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:25 ERROR AbstractOperator:247 - Reconciliation #143(watch) Kafka(myproject/my-cluster): createOrUpdate failed java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-02-02 23:31:25 INFO StrimziPodSetController:262 - Reconciliation #148(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:25 INFO StrimziPodSetController:228 - Reconciliation #149(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:25 INFO StrimziPodSetController:262 - Reconciliation #149(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:25 INFO CrdOperator:113 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Status of Kafka my-cluster in namespace myproject has been updated 2022-02-02 23:31:26 INFO OperatorWatcher:38 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED 2022-02-02 23:31:26 WARN AbstractOperator:532 - Reconciliation #143(watch) Kafka(myproject/my-cluster): Failed to reconcile java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-02-02 23:31:26 INFO AbstractOperator:226 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification 2022-02-02 23:31:26 INFO StrimziPodSetController:228 - Reconciliation #151(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:26 ERROR AbstractOperator:247 - Reconciliation #150(watch) Kafka(myproject/my-cluster): createOrUpdate failed java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-02-02 23:31:26 INFO StrimziPodSetController:262 - Reconciliation #151(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:26 INFO StrimziPodSetController:228 - Reconciliation #152(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:26 WARN AbstractOperator:532 - Reconciliation #150(watch) Kafka(myproject/my-cluster): Failed to reconcile java.lang.RuntimeException: Pod my-cluster-zookeeper-3 in namespace myproject not found in desired StrimziPodSet at io.strimzi.operator.cluster.operator.resource.PodRevision.hasChanged(PodRevision.java:66) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.getReasonsToRestartPod(KafkaAssemblyOperator.java:4005) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.assembly.KafkaAssemblyOperator$ReconciliationState.lambda$zkRollingUpdate$46(KafkaAssemblyOperator.java:1461) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.cluster.operator.resource.ZooKeeperRoller.lambda$maybeRollingUpdate$0(ZooKeeperRoller.java:60) ~[io.strimzi.cluster-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:211) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.tryComplete(PromiseImpl.java:23) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.PromiseImpl.onSuccess(PromiseImpl.java:49) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-02-02 23:31:26 INFO StrimziPodSetController:262 - Reconciliation #152(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:26 INFO StrimziPodSetController:228 - Reconciliation #153(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:26 INFO StrimziPodSetController:262 - Reconciliation #153(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:26 INFO StrimziPodSetController:228 - Reconciliation #154(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:26 INFO StrimziPodSetController:262 - Reconciliation #154(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:228 - Reconciliation #155(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:262 - Reconciliation #155(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:228 - Reconciliation #156(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:262 - Reconciliation #156(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:228 - Reconciliation #157(watch) StrimziPodSet(myproject/my-cluster-zookeeper): StrimziPodSet will be reconciled 2022-02-02 23:31:27 INFO StrimziPodSetController:262 - Reconciliation #157(watch) StrimziPodSet(myproject/my-cluster-zookeeper): reconciled 2022-02-02 23:31:40 INFO ClusterOperator:123 - Triggering periodic reconciliation for namespace myproject 2022-02-02 23:31:40 INFO AbstractOperator:226 - Reconciliation #158(timer) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification 2022-02-02 23:31:42 INFO CrdOperator:113 - Reconciliation #158(timer) Kafka(myproject/my-cluster): Status of Kafka my-cluster in namespace myproject has been updated 2022-02-02 23:31:42 INFO OperatorWatcher:38 - Reconciliation #159(watch) Kafka(myproject/my-cluster): Kafka my-cluster in namespace myproject was MODIFIED 2022-02-02 23:31:42 INFO AbstractOperator:517 - Reconciliation #158(timer) Kafka(myproject/my-cluster): reconciled 2022-02-02 23:31:42 INFO AbstractOperator:226 - Reconciliation #159(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification 2022-02-02 23:31:44 INFO AbstractOperator:517 - Reconciliation #159(watch) Kafka(myproject/my-cluster): reconciled
The issue seems to eventually resolve it self when the pods are actually deleted. But this is not exactly nice behaviour and it should be investigated.
It looks like the issue here is that the scale-down is not actually waiting for the pods to be deleted (the deletion happens asynchronously by the StrimziPodSetController). Waiting for their deletion should make sure they do not leak into the Zoo RU and these errors do not happen again.