-
Bug
-
Resolution: Done
-
Undefined
-
2.3.0.GA
-
None
-
False
-
None
-
False
-
-
Tracking upstream here [7]
Some of the operators do not recreate watches after they are closed with an error:
- AbstractConnectOperator
- KafkaRebalanceAssemblyOperator
This can lead to a custom resources not being watched and updated after the watch on that resource expires [1]. When this happens errors like the following will show in the Cluster Operator log:
2022-12-31T20:41:34.080724387Z 2022-12-31 20:41:34 ERROR AbstractWatchManager:315 - Unhandled exception encountered in watcher event handler 2022-12-31T20:41:34.080724387Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 119220794 (119221972) 2022-12-31T20:41:34.080724387Z at io.strimzi.operator.cluster.operator.assembly.KafkaRebalanceAssemblyOperator$1.onClose(KafkaRebalanceAssemblyOperator.java:233) ~[io.strimzi.cluster-operator-0.29.0.redhat-00014.jar:0.29.0.redhat-00014] 2022-12-31T20:41:34.080724387Z at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) ~[io.fabric8.kubernetes-client-5.12.2.redhat-00002.jar:?]{{}}
We had this same problem with the TopicOperator in the past and fixed it by updating the K8sTopicWatcher onClose() method. [2] We also already recreate watches for other operators in the same situation [5] [6] We should look into doing the same for the:
- AbstractConnectOperator [3]
- KafkaRebalanceAssemblyOperator [4]
Expected behavior
From what I understand from [1] it is the client's responsibility, or in this case the operators responsibility, to recreate a watch after it expires to avoid fabric8 errors like the following:
2022-12-31T20:41:34.080724387Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 119220794 (119221972){{}}
Additional context
[1] https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version
[2] #3771
[3] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java#L287-L291
[4] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaRebalanceAssemblyOperator.java#L232-L236
[5] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/operator-common/src/main/java/io/strimzi/operator/common/OperatorWatcher.java#L52-L54
[6] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/operator-common/src/main/java/io/strimzi/operator/common/AbstractOperator.java#L490-L497
[7] https://github.com/strimzi/strimzi-kafka-operator/issues/8060