Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-4708

Recreate watcher when closed with an exception

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 2.6.0.GA
    • 2.3.0.GA
    • cluster-operator
    • None
    • False
    • None
    • False

      Tracking upstream here [7]

      Some of the operators do not recreate watches after they are closed with an error:

      • AbstractConnectOperator
      • KafkaRebalanceAssemblyOperator

      This can lead to a custom resources not being watched and updated after the watch on that resource expires [1]. When this happens errors like the following will show in the Cluster Operator log:

      2022-12-31T20:41:34.080724387Z 2022-12-31 20:41:34 ERROR AbstractWatchManager:315 - Unhandled exception encountered in watcher event handler
      2022-12-31T20:41:34.080724387Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 119220794 (119221972)
      2022-12-31T20:41:34.080724387Z at io.strimzi.operator.cluster.operator.assembly.KafkaRebalanceAssemblyOperator$1.onClose(KafkaRebalanceAssemblyOperator.java:233) ~[io.strimzi.cluster-operator-0.29.0.redhat-00014.jar:0.29.0.redhat-00014]
      2022-12-31T20:41:34.080724387Z at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) ~[io.fabric8.kubernetes-client-5.12.2.redhat-00002.jar:?]{{}} 

      We had this same problem with the TopicOperator in the past and fixed it by updating the K8sTopicWatcher onClose() method. [2] We also already recreate watches for other operators in the same situation [5] [6] We should look into doing the same for the:

      • AbstractConnectOperator [3]
      • KafkaRebalanceAssemblyOperator [4]

      Expected behavior
      From what I understand from [1] it is the client's responsibility, or in this case the operators responsibility, to recreate a watch after it expires to avoid fabric8 errors like the following:

      2022-12-31T20:41:34.080724387Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 119220794 (119221972){{}} 

      Additional context
      [1] https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version
      [2] #3771
      [3] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/AbstractConnectOperator.java#L287-L291
      [4] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaRebalanceAssemblyOperator.java#L232-L236
      [5] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/operator-common/src/main/java/io/strimzi/operator/common/OperatorWatcher.java#L52-L54
      [6] https://github.com/strimzi/strimzi-kafka-operator/blob/0.33.0/operator-common/src/main/java/io/strimzi/operator/common/AbstractOperator.java#L490-L497

      [7] https://github.com/strimzi/strimzi-kafka-operator/issues/8060

              morsak Maros Orsak
              kliberti Kyle Liberti
              Maros Orsak Maros Orsak
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: