Investigate and address CrashLoopBackOff scenario in the OLM labeller logic when CRDs are missing the required labels.
Goal:
Analyse the scenario(s) where required CRD labels are missing and the code fails at this point:
https://github.com/operator-framework/operator-lifecycle-manager/blob/3775a4d31f6625cce96c7f3e80c96e74038c4a6e/pkg/controller/operators/labeller/filters.go#L177-L185
We should determine if there’s a fix that does not let the pod enter in a CrashLoopBackOff
Context
This issue was originally from OCPBUGS-53161, where we changed a log level from info to level per a request from the support.
Further analysis based on https://access.redhat.com/solutions/7112019.
PS.: According to jlanford@redhat.com:
Looking at https://access.redhat.com/solutions/7112019It seems like: # We detect that there are CRDs that need to be labeled, thus causing us to start up in the "let's do the labelling" mode
- We don't actually label those CRDs?
- We say we've labelled everything, so we exit, which causes the replica set to spin a new pod, which starts again at (1).
We're going into CrashLoopBackoff, but that shouldn't be happening. In theory we should have a single restart and then be good to go.
We might have other cases, but it seems that to reproduce the issue we need to * The user faced when upgrading from 4.14.44 to 4.15.44
- The scenario shows caused by an solution that was uninstalled before the upgrade and then we do not remove the CRD and we faced the missing label on it afterwords.
We have another Bug to fix the monitor for we are able to catcher those cases: https://issues.redhat.com/browse/OCPBUGS-53442
- clones
-
OCPBUGS-53161 olm-operator pod going to CLBO with a message "detected that every object is labelled, exiting to re-start the process"
-
- Verified
-