Uploaded image for project: 'Operator Runtime'
  1. Operator Runtime
  2. OPRUN-3267

Impact statement request for OCPBUGS-31080 Installed Operators in "Failed" status after upgrading to 4.15.3

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • False
    • None
    • False

      This is an impact statement for the OCPBUGS-31080OCPBUGS-24009, and OCPBUGS-31073 series:

      Which 4.y.z to 4.y'.z' updates increase vulnerability

      • Any release up to 4.15.{current-z}

      Which types of clusters?

      • Any non-Microshift cluster with an operator installed via OLM before upgrade to 4.15. After upgrading to 4.15, re-installing a previously uninstalled operator may also cause this issue.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      • OLM Operators can't be upgraded and may incorrectly report failed status.

      How involved is remediation?

      • Delete the resources associated with the OLM installation related to the failure message in the olm-operator.

      A failure message similar to this may appear on the CSV:

      InstallComponentFailed install strategy failed: rolebindings.rbac.authorization.k8s.io "openshift-gitops-operator-controller-manager-service-auth-reader" already exists

      The following resource types have been observed to encounter this issue and should be safe to delete:

        • ClusterRoleBinding suffixed with "-system:auth-delegator"
        • Service
        • RoleBinding suffixed with "-auth-reader"

      Under no circumstances should a user delete a CustomResourceDefinition (CRD) if the same error occurs and names such a resource as data loss may occur. Note that we have not seen this type of resource named in the error from any of our users so far.

      Labeling the problematic resources with olm.managed: "true" then restarting the olm-operator pod in the openshift-operator-lifecycle-manager namespace may also resolve the issue if the resource appears risky to delete.

      Is this a regression?

      This is a new issue related to performance improvements added to OLM in 4.15

      https://issues.redhat.com/browse/OCPBUGS-24009

      https://issues.redhat.com/browse/OCPBUGS-31080

      https://issues.redhat.com/browse/OCPBUGS-28845

       

              rh-ee-dfranz Daniel Franz
              afri@afri.cz Petr Muller
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: