Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57356

olm-operator pod CrashLoopBackOff when olm.managed=true label (introduced in 4.15) is missing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • 4.20.0
    • 4.15.z
    • OLM
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Bug Fix
    • None
    • None
    • None
    • None
    • None

      Investigate and address CrashLoopBackOff scenario in the OLM labeller logic when CRDs are missing the required labels.

      Goal:
      Analyse the scenario(s) where required CRD labels are missing and the code fails at this point:
      https://github.com/operator-framework/operator-lifecycle-manager/blob/3775a4d31f6625cce96c7f3e80c96e74038c4a6e/pkg/controller/operators/labeller/filters.go#L177-L185

      We should determine if there’s a fix that does not let the pod enter in a CrashLoopBackOff 

      Context

      This issue was originally from OCPBUGS-53161, where we changed a log level from info to level per a request from the support.

      Further analysis based on https://access.redhat.com/solutions/7112019

      PS.: According to jlanford@redhat.com:

      Looking at https://access.redhat.com/solutions/7112019It seems like: # We detect that there are CRDs that need to be labeled, thus causing us to start up in the "let's do the labelling" mode

      1. We don't actually label those CRDs?
      2. We say we've labelled everything, so we exit, which causes the replica set to spin a new pod, which starts again at (1).

      We're going into CrashLoopBackoff, but that shouldn't be happening. In theory we should have a single restart and then be good to go.

      We might have other cases, but it seems that to reproduce the issue we need to * The user faced when upgrading from 4.14.44 to 4.15.44

      • The scenario shows caused by an solution that was uninstalled before the upgrade and then we do not remove the CRD and we faced the  missing label on it afterwords.

      We have another Bug to fix the monitor for we are able to catcher those cases: https://issues.redhat.com/browse/OCPBUGS-53442

              rh-ee-cchantse Catherine Chan-Tse
              rhn-support-amuhamme MUHAMMED ASLAM V K
              None
              None
              Jian Zhang Jian Zhang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: