Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25802

olm-operator pod always restart due to "detected that every object is labelled, exiting to re-start the process..." when upgrading OCP to 4.15 from 4.14.6

XMLWordPrintable

    • Critical
    • No
    • Luigi 246
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-25448. The following is the description of the original issue:

      Description of problem:

      When upgrading OCP 4.14.6 to 4.15.0-0.nightly-2023-12-13-032512, olm-operator pod always restarts, which blocks the cluster upgrading.

      MacBook-Pro:~ jianzhang$ omg get clusterversion 
      2023-12-15 16:24:34.977 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
      NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
      version  4.14.6   True       True         4h47m  Working towards 4.15.0-0.nightly-2023-12-13-032512: 701 of 873 done (80% complete), waiting on operator-lifecycle-manager
      
      MacBook-Pro:~ jianzhang$ omg get pods 
      2023-12-15 16:47:36.383 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
      NAME                                     READY  STATUS     RESTARTS  AGE
      catalog-operator-564b666f96-6nmq8        1/1    Running    1         1h59m
      collect-profiles-28375140-n9f2p          0/1    Succeeded  0         42m
      collect-profiles-28375155-sf2qj          0/1    Succeeded  0         27m
      collect-profiles-28375170-xkbxf          0/1    Succeeded  0         12m
      olm-operator-6bfd5f76bc-xb5lk            0/1    Running    27        1h59m
      package-server-manager-5b7969559f-68nn7  2/2    Running    0         1h59m
      packageserver-5ffcb95bff-fvvpx           1/1    Running    0         1h58m
      packageserver-5ffcb95bff-hgvxt           1/1    Running    0         1h58m
      
      MacBook-Pro:~ jianzhang$ omg logs olm-operator-6bfd5f76bc-xb5lk --previous
      2023-12-15 16:23:02.300 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
      2023-12-13T23:38:05.452697228Z time="2023-12-13T23:38:05Z" level=info msg="log level info"
      2023-12-13T23:38:05.452950096Z time="2023-12-13T23:38:05Z" level=info msg="TLS keys set, using https for metrics"
      2023-12-13T23:38:05.515929950Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="rbac.authorization.k8s.io/v1, Resource=rolebindings" nonconforming=1
      2023-12-13T23:38:05.588194624Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="/v1, Resource=services" nonconforming=1
      2023-12-13T23:38:06.116654658Z time="2023-12-13T23:38:06Z" level=info msg="detected ability to filter informers" canFilter=false
      2023-12-13T23:38:06.118496116Z time="2023-12-13T23:38:06Z" level=info msg="registering labeller" gvr="apps/v1, Resource=deployments" index=0
      ...
      ...
      2023-12-13T23:38:06.381370939Z time="2023-12-13T23:38:06Z" level=info msg="labeller complete" gvr="rbac.authorization.k8s.io/v1, Resource=clusterrolebindings" index=0
      2023-12-13T23:38:06.381424190Z time="2023-12-13T23:38:06Z" level=info msg="starting clusteroperator monitor loop" monitor=clusteroperator
      2023-12-13T23:38:06.381467749Z time="2023-12-13T23:38:06Z" level=info msg="detected that every object is labelled, exiting to re-start the process..."    

      Version-Release number of selected component (if applicable):

      MacBook-Pro:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-12-13-032512 |grep olm 
        operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683
        operator-registry                              https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683    

      How reproducible:

       always   

      Steps to Reproduce:

      1, rerun this prow job: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/ 

      Actual results:

          Cluster failed to upgrade due to olm pods crash.

      Expected results:

          Cluster upgraded successfully.

      Additional info:

      Must gather log in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/1734995337258471424/artifacts/ibmcloud-ipi-f28/gather-must-gather/artifacts/ 

            skuznets@redhat.com Steve Kuznetsov
            openshift-crt-jira-prow OpenShift Prow Bot
            Jian Zhang Jian Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: