Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24009

OLM Operator packageserver Reporting Unavailable on InstallComponentFailed

    XMLWordPrintable

Details

    • Yes
    • 2
    • ETCD Sprint 247, Nyan Cat, Orion, Phlogiston 250
    • 4
    • Rejected
    • False
    • Hide

      Marking as a 4.15 regression and flagging proposed release blocker as a result.

      Show
      Marking as a 4.15 regression and flagging proposed release blocker as a result.
    • Fixes an issue where OLM cannot find an existing ClusterRoleBinding or Service and creates them a second time, causing errors.
    • Bug Fix
    • In Progress

    Description

      TRT has picked up a somewhat rare but new failure coming out of the packageserver operator, it surfaces in this test. It appears to only be affecting Azure 4.14 -> 4.15 (aka minor) upgrades, seems to be roughly 5% of the time.

      Examining job runs where this test failed in sippy we can see the error output is typically:

       operator conditions operator-lifecycle-manager-packageserver expand_less 0s
      {Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: clusterrolebindings.rbac.authorization.k8s.io "packageserver-service-system:auth-delegator" already exists Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: clusterrolebindings.rbac.authorization.k8s.io "packageserver-service-system:auth-delegator" already exists}
      

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1729053846077968384

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1728600812575264768

      or

      {Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists}
      

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1727785446827626496

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1727513681316548608

      The failed job runs also indicate this problem appears to have started, or started occurring far more frequently, somewhere around Nov 14 - Nov 18. It's been very common since the 18th happening multiple times a day.

      Attachments

        Issue Links

          Activity

            People

              rh-ee-dfranz Daniel Franz
              rhn-engineering-dgoodwin Devan Goodwin
              Kui Wang Kui Wang
              Votes:
              1 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

                Created:
                Updated: