-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
Describe the bug
In Operate First we use multiple `KfDef`s on the same cluster, when operator tries to apply them it basically loops through them. The problem is when one of the `KfDef`s is faulty and fails to apply (imagine a change/patch that is not allowed - like a change in storage class on a PVC), the operator gets stuck on this faulty `KfDef` and keeps retrying to apply it. That means it never gets to apply the other `KfDef`s and makes them wait to be processed - which never happens.
time="2021-06-03T15:23:43Z" level=warning msg="Encountered error applying application jupyterhub: (kubeflow.error): Code 500 with message: Apply.Run : error when applying patch:\n\{\"metadata\":{\"annotations\":{\"kubectl.kubernetes.io/last-applied-configuration\":\"{\\\"apiVersion\\\":\\\"v1\\\",\\\"kind\\\":\\\"PersistentVolumeClaim\\\",\\\"metadata\\\":{\\\"annotations\\\":{\\\"kfctl.kubeflow.io/kfdef-instance\\\":\\\"opendatahub.opf-jupyterhub\\\",\\\"volume.beta.kubernetes.io/storage-class\\\":\\\"moc-nfs-csi\\\"},\\\"labels\\\":\{\\\"component.opendatahub.io/name\\\":\\\"jupyterhub\\\",\\\"opendatahub.io/component\\\":\\\"true\\\"},\\\"name\\\":\\\"jupyterhub-db\\\",\\\"namespace\\\":\\\"opf-jupyterhub\\\"},\\\"spec\\\":\{\\\"accessModes\\\":[\\\"ReadWriteOnce\\\"],\\\"resources\\\":{\\\"requests\\\":{\\\"storage\\\":\\\"1Gi\\\"}}}}\\n\",\"volume.beta.kubernetes.io/storage-class\":\"moc-nfs-csi\"}}}\nto:\nResource: \"/v1, Resource=persistentvolumeclaims\", GroupVersionKind: \"/v1, Kind=PersistentVolume... time="2021-06-03T15:23:43Z" level=warning msg="Will retry in 25 seconds."
Expected behavior
If operator faces an error, it should stop retrying to apply a faulty `KfDef` after a few attempts and not indefinitely.
Screenshots
https://github.com/opendatahub-io/opendatahub-operator/files/6592755/log.6.txt
Additional context
Related to: https://github.com/operate-first/support/issues/260
Imported from: https://github.com/opendatahub-io/opendatahub-operator/issues/135