Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41549

[4.17] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.13, 4.12, 4.14, 4.15, 4.16, 4.17
    • OLM
    • Important
    • No
    • YellowJacket OLM Sprint 259
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Before this update, clusters with many custom resources (CRs) experienced timeouts from the API server and stranded updates where the only workaround was to uninstall and then reinstall the stranded Operators. This occurred because OLM evaluated potential updates by using a dynamic client lister. With this fix, OLM uses a paging lister for custom resource definitions (CRDs) to avoid timeouts and stranded updates. (link:https://issues.redhat.com/browse/OCPBUGS-41549[*OCPBUGS-41549*])
      Show
      * Before this update, clusters with many custom resources (CRs) experienced timeouts from the API server and stranded updates where the only workaround was to uninstall and then reinstall the stranded Operators. This occurred because OLM evaluated potential updates by using a dynamic client lister. With this fix, OLM uses a paging lister for custom resource definitions (CRDs) to avoid timeouts and stranded updates. (link: https://issues.redhat.com/browse/OCPBUGS-41549 [* OCPBUGS-41549 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-35358. The following is the description of the original issue:

      I'm working with the Gitops operator (1.7)  and when there is a high amount of CR (38.000 applications objects in this case) the related install plan get stuck with the following error:

       

      - lastTransitionTime: "2024-06-11T14:28:40Z"
          lastUpdateTime: "2024-06-11T14:29:42Z"
          message: 'error validating existing CRs against new CRD''s schema for "applications.argoproj.io":
            error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"argoproj.io",
            Version:"v1alpha1", Resource:"applications"}: the server was unable to return
            a response in the time allotted, but may still be processing the request' 

      Even waiting for a long time the operator is unable to move forward not removing or reinstalling its components.

       

      Over a lab, the issue was not present until we started to add load to the cluster (applications.argoproj.io) and when we hit 26.000 applications we were not able to upgrade or reinstall the operator anymore.

       

            rh-ee-jkeister Jordan Keister
            openshift-crt-jira-prow OpenShift Prow Bot
            Jian Zhang Jian Zhang
            Michael Peter Michael Peter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: