Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42017

[4.14] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.13, 4.12, 4.14, 4.15, 4.16, 4.17
    • OLM
    • Important
    • No
    • YellowJacket OLM Sprint 259
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when the Operator Lifecycle Manager (OLM) evaluated a potential upgrade, it used the dynamic client list for all custom resource (CR) instances in the cluster. Clusters with a large number of CRs could experience timeouts from the apiserver and stranded upgrades. With this release, the issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-42017[*OCPBUGS-42017*])
      Show
      * Previously, when the Operator Lifecycle Manager (OLM) evaluated a potential upgrade, it used the dynamic client list for all custom resource (CR) instances in the cluster. Clusters with a large number of CRs could experience timeouts from the apiserver and stranded upgrades. With this release, the issue is resolved. (link: https://issues.redhat.com/browse/OCPBUGS-42017 [* OCPBUGS-42017 *])
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-41819. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-41677. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-41549. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-35358. The following is the description of the original issue:
      โ€”
      I'm working with the Gitops operator (1.7)  and when there is a high amount of CR (38.000 applications objects in this case) the related install plan get stuck with the following error:

       

      - lastTransitionTime: "2024-06-11T14:28:40Z"
          lastUpdateTime: "2024-06-11T14:29:42Z"
          message: 'error validating existing CRs against new CRD''s schema for "applications.argoproj.io":
            error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"argoproj.io",
            Version:"v1alpha1", Resource:"applications"}: the server was unable to return
            a response in the time allotted, but may still be processing the request' 

      Even waiting for a long time the operator is unable to move forward not removing or reinstalling its components.

       

      Over a lab, the issue was not present until we started to add load to the cluster (applications.argoproj.io) and when we hit 26.000 applications we were not able to upgrade or reinstall the operator anymore.

       

            [OCPBUGS-42017] [4.14] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

            OpenShift Prow Bot created issue -
            OpenShift Prow Bot made changes -
            QA Contact New: Jian Zhang [ jiazha ]
            OpenShift Prow Bot made changes -
            Link New: This issue clones OCPBUGS-41819 [ OCPBUGS-41819 ]
            OpenShift Prow Bot made changes -
            Link New: This issue is blocked by OCPBUGS-41819 [ OCPBUGS-41819 ]
            OpenShift Prow Bot made changes -
            Release Note Text New: When OLM evaluates a potential upgrade, it used the dynamic client lister for all CR instances in the cluster. For clusters with a large number of CRs that could result in timeouts from the apiserver and stranded upgrades where the only workaround was to uninstall/reinstall impacted operators.
            Release Note Type New: Bug Fix [ 30950 ]
            Sprint New: YellowJacket OLM Sprint 259 [ 63531 ]
            Target Version Original: 4.15.z [ 12417803 ] New: 4.14.z [ 12402535 ]
            Assignee Original: Lalatendu Mohanty [ lmohanty@redhat.com ] New: Jordan Keister [ rh-ee-jkeister ]
            OpenShift Jira Bot made changes -
            Release Note Status New: In Progress [ 30960 ]
            OpenShift Jira Bot made changes -
            Release Blocker New: Proposed [ 25756 ]
            OpenShift Prow Bot made changes -
            Status Original: New [ 10016 ] New: POST [ 15726 ]
            OpenShift Jira Bot made changes -
            Assignee Original: Jordan Keister [ rh-ee-jkeister ] New: Lalatendu Mohanty [ lmohanty@redhat.com ]
            OpenShift Prow Bot made changes -
            Remote Link New: This issue links to "openshift/operator-framework-olm#869: OCPBUGS-42017: adds paginating lister for evaluating CRs' upgrade fitness versus new CRDs. (Web Link)" [ 1759654 ]
            Jordan Keister made changes -
            Assignee Original: Lalatendu Mohanty [ lmohanty@redhat.com ] New: Jordan Keister [ rh-ee-jkeister ]
            Wallace Lewis made changes -
            Release Blocker Original: Proposed [ 25756 ] New: Rejected [ 25757 ]
            Jian Zhang made changes -
            Summary Original: [4.15] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high. New: [4.14] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.
            OpenShift Prow Bot made changes -
            Status Original: POST [ 15726 ] New: MODIFIED [ 14454 ]
            OpenShift Prow Bot made changes -
            Link New: This issue is cloned by OCPBUGS-42146 [ OCPBUGS-42146 ]
            OpenShift Prow Bot made changes -
            Link New: This issue blocks OCPBUGS-42146 [ OCPBUGS-42146 ]
            ART Bot made changes -
            Status Original: MODIFIED [ 14454 ] New: ON_QA [ 15723 ]
            Per Goncalves da Silva made changes -
            Status Original: ON_QA [ 15723 ] New: Verified [ 10015 ]
            Errata Tool made changes -
            Remote Link New: This issue links to "RHBA-2024:7184 (Web Link)" [ 1770523 ]
            Olivia Brown made changes -
            Release Note Text Original: When OLM evaluates a potential upgrade, it used the dynamic client lister for all CR instances in the cluster. For clusters with a large number of CRs that could result in timeouts from the apiserver and stranded upgrades where the only workaround was to uninstall/reinstall impacted operators. New: * Previously, when the Operator Lifecycle Manager (OLM) evaluated a potential upgrade, it used the dynamic client list for all custom resource (CR) instances in the cluster. Clusters with a large number of CRs could experience timeouts from the apiserver and stranded upgrades. With this release, the issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-42017[*OCPBUGS-42017*])
            Errata Tool made changes -
            Resolution New: Done-Errata [ 10803 ]
            Status Original: Verified [ 10015 ] New: Closed [ 6 ]

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: