Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31479

Installed Operators in "Failed" status after upgrading to 4.15.3

    • Critical
    • No
    • Quality OLM Sprint 251, Rasputin OLM Sprint 252
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Allows OLM to update or create the auth-reader RoleBinding without needing to check for presence. This allows OLM to bypass issues found with the labeler introduced in 4.15 for performance improvements while we investigate root cause.
    • Bug Fix
    • In Progress

      Description of problem:

      We upgraded our OpenShift Cluster from 4.4.16 to 4.15.3 and multiple operators are now in "Failed" status with the following CSV conditions such as:
      - NeedsReinstall installing: deployment changed old hash=5f6b8fc6f7, new hash=5hFv6Gemy1Zri3J9ulXfjG9qOzoFL8FMsLNcLR
      - InstallComponentFailed install strategy failed: rolebindings.rbac.authorization.k8s.io "openshift-gitops-operator-controller-manager-service-auth-reader" already exists
      
      All other failures refer to a similar "auth-reader" rolebinding that already exist.
       
          

      Version-Release number of selected component (if applicable):

      OpenShift 4.15.3
          

      How reproducible:

      Happened on several installed operators but on the only cluster we upgraded (our staging cluster)
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

      All operators should be up-to-date
          

      Additional info:

      This may be related to https://github.com/operator-framework/operator-lifecycle-manager/pull/3159 
          

            [OCPBUGS-31479] Installed Operators in "Failed" status after upgrading to 4.15.3

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            This bug is fixed in all versions which offer upgrades from 4.15 to 4.16 so therefore closing the 4.16 bug.

            Scott Dodson added a comment - This bug is fixed in all versions which offer upgrades from 4.15 to 4.16 so therefore closing the 4.16 bug.

            patmarti@redhat.com based on the releases page the fix was included in 4.15.11.

            Daniel Franz added a comment - patmarti@redhat.com based on the releases page the fix was included in 4.15.11.

            Which 4.15 release do we target for this fix?

            Patrick Martin added a comment - Which 4.15 release do we target for this fix?

            Hello, Is there any update on the workaround for this?

            Nikitha Dokala (Inactive) added a comment - Hello, Is there any update on the workaround for this?

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which had Priority = "Blocker" and information already set in the Release Blocker field) is being updated to Priority = Critical. The Release Blocker field was not changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which had Priority = "Blocker" and information already set in the Release Blocker field) is being updated to Priority = Critical. The Release Blocker field was not changed.

            Hi rh-ee-dfranz, I don't see a backport for this yet. Is this in progress? 

            Claudio Busse added a comment - Hi rh-ee-dfranz , I don't see a backport for this yet. Is this in progress? 

            Much appreciated jhopper@redhat.com! Will see if those clear things up for the customer case and will document in a KCS, at least specific to CNV. 

            Sean Haselden added a comment - Much appreciated jhopper@redhat.com ! Will see if those clear things up for the customer case and will document in a KCS, at least specific to CNV. 

            Hi shaselde@redhat.com  I also needed:

            oc delete rolebinding -n kube-system hostpath-provisioner-operator-service-auth-reader
            oc delete rolebinding -n kube-system ssp-operator-service-auth-reader 

            If there are other operators w/ issues you can check all "OLM owned" roles in that kube-system namespace with:

             oc get rolebinding -n kube-system -l=olm.owner.kind=ClusterServiceVersion 

            Jenifer Abrams added a comment - Hi shaselde@redhat.com   I also needed: oc delete rolebinding -n kube-system hostpath-provisioner- operator -service-auth-reader oc delete rolebinding -n kube-system ssp- operator -service-auth-reader If there are other operators w/ issues you can check all "OLM owned" roles in that kube-system namespace with:  oc get rolebinding -n kube-system -l=olm.owner.kind=ClusterServiceVersion

            mkletz@redhat.com  do you happen to have what those three were?

            Customer fixed the hco-webhook-service-auth-reader and now gets an error for 
            "hostpath-provisioner-operator-service-auth-reader" already exists"
            % oc get rolebinding -A | grep hostpath
            kube-system hostpath-provisioner-operator-service-auth-reader Role/extension-apiserver-authentication-reader 83d
            openshift-cnv hostpath-provisioner-operator-service-cert Role/hostpath-provisioner-operator-service-cert 3m55s
            What would the correct syntax be to remove the offending rbac here? 

            Sean Haselden added a comment - mkletz@redhat.com   do you happen to have what those three were? Customer fixed the hco-webhook-service-auth-reader and now gets an error for  "hostpath-provisioner-operator-service-auth-reader" already exists" % oc get rolebinding -A | grep hostpath kube-system hostpath-provisioner-operator-service-auth-reader Role/extension-apiserver-authentication-reader 83d openshift-cnv hostpath-provisioner-operator-service-cert Role/hostpath-provisioner-operator-service-cert 3m55s What would the correct syntax be to remove the offending rbac here? 

              rh-ee-dfranz Daniel Franz
              xcoulon@redhat.com Xavier Coulon
              Kui Wang Kui Wang
              Votes:
              3 Vote for this issue
              Watchers:
              25 Start watching this issue

                Created:
                Updated:
                Resolved: