Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42017

[4.14] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.13, 4.12, 4.14, 4.15, 4.16, 4.17
    • OLM
    • Important
    • No
    • YellowJacket OLM Sprint 259
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when the Operator Lifecycle Manager (OLM) evaluated a potential upgrade, it used the dynamic client list for all custom resource (CR) instances in the cluster. Clusters with a large number of CRs could experience timeouts from the apiserver and stranded upgrades. With this release, the issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-42017[*OCPBUGS-42017*])
      Show
      * Previously, when the Operator Lifecycle Manager (OLM) evaluated a potential upgrade, it used the dynamic client list for all custom resource (CR) instances in the cluster. Clusters with a large number of CRs could experience timeouts from the apiserver and stranded upgrades. With this release, the issue is resolved. (link: https://issues.redhat.com/browse/OCPBUGS-42017 [* OCPBUGS-42017 *])
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-41819. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-41677. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-41549. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-35358. The following is the description of the original issue:
      โ€”
      I'm working with the Gitops operator (1.7)  and when there is a high amount of CR (38.000 applications objects in this case) the related install plan get stuck with the following error:

       

      - lastTransitionTime: "2024-06-11T14:28:40Z"
          lastUpdateTime: "2024-06-11T14:29:42Z"
          message: 'error validating existing CRs against new CRD''s schema for "applications.argoproj.io":
            error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"argoproj.io",
            Version:"v1alpha1", Resource:"applications"}: the server was unable to return
            a response in the time allotted, but may still be processing the request' 

      Even waiting for a long time the operator is unable to move forward not removing or reinstalling its components.

       

      Over a lab, the issue was not present until we started to add load to the cluster (applications.argoproj.io) and when we hit 26.000 applications we were not able to upgrade or reinstall the operator anymore.

       

            [OCPBUGS-42017] [4.14] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: OpenShift Container Platform 4.14.38 security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:7184

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Container Platform 4.14.38 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:7184

            Jian Zhang added a comment -

            ok, thanks!

            Jian Zhang added a comment - ok, thanks!

            rhn-support-jiazha I've moved this to verified (since it merged and it looks like you verified it =D)

            Per Goncalves da Silva added a comment - rhn-support-jiazha I've moved this to verified (since it merged and it looks like you verified it =D)

            Jian Zhang added a comment -

            Test pass, verify it.

            1, Install OCP 4.14 cluster(AWS, SNO, 8 CPU, 32G Memory) and Gitops operator 1.7.4, as follows, 

             

            launch 4.14,openshift/operator-framework-olm#869 aws,single-node
            
            jiazha-mac:~ jiazha$ oc get clusterversion 
            NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.14.0-0.test-2024-09-18-045200-ci-ln-ci835gb-latest   True        False         34m     Cluster version is 4.14.0-0.test-2024-09-18-045200-ci-ln-ci835gb-latest
            
            jiazha-mac:~ jiazha$ oc get node
            NAME                                      STATUS   ROLES                         AGE   VERSION
            ip-10-0-80-0.us-west-2.compute.internal   Ready    control-plane,master,worker   58m   v1.27.16+03a907c
            
            jiazha-mac:~ jiazha$ oc get sub -A
            NAMESPACE             NAME                        PACKAGE                     SOURCE             CHANNEL
            openshift-operators   openshift-gitops-operator   openshift-gitops-operator   redhat-operators   gitops-1.7
            jiazha-mac:~ jiazha$ oc get csv -n openshift-operators
            NAME                                              DISPLAY                    VERSION                REPLACES                           PHASE
            openshift-gitops-operator.v1.7.4-0.1690486082.p   Red Hat OpenShift GitOps   1.7.4+0.1690486082.p   openshift-gitops-operator.v1.7.3   Succeeded
            jiazha-mac:~ jiazha$ oc get pods -n openshift-operators
            NAME                                                  READY   STATUS    RESTARTS   AGE
            gitops-operator-controller-manager-65847b9f7b-r7zwt   1/1     Running   0          47s
            

            2, create 500.000 application CRs by using the script below. 

            jiazha-mac:~ jiazha$ cat create-app3.sh 
            #!/bin/bash
            
            
            # Namespace where the Application resources will be created
            NAMESPACE="openshift-operators"
            DEST_NAMESPACE="jian"
            PROJECT_NAME="jian"
            DEST_NAME="test"
            
            
            # Number of Applications to create
            TOTAL_APPLICATIONS=200000
            
            
            # Number of parallel jobs
            PARALLEL_JOBS=1000
            
            
            # Function to generate a unique application YAML and apply it
            create_application() {
              local app_name=$1
              cat <<EOF | oc apply -f -
            apiVersion: argoproj.io/v1alpha1
            kind: Application
            metadata:
              name: ${app_name}
              namespace: ${NAMESPACE}
            spec:
              destination:
                name: ${DEST_NAME}
                namespace: ${DEST_NAMESPACE}
              project: ${PROJECT_NAME}
              source:
                repoURL: https://github.com/jianzhangbjz/learn-operator/tree/master/manifests
            EOF
            }
            
            
            # Main loop to create the specified number of applications in parallel
            count=0
            
            
            for i in $(seq 1 $TOTAL_APPLICATIONS); do
              app_name="example2-${i}"
              echo "Creating Application: ${app_name}"
              create_application "${app_name}" &
            
            
              count=$((count + 1))
            
            
              # If we've reached the parallel jobs limit, wait for all jobs to complete
              if [[ $count -ge $PARALLEL_JOBS ]]; then
                wait
                count=0
              fi
            done
            
            
            # Wait for any remaining background jobs to complete
            wait
            
            
            echo "Successfully created $TOTAL_APPLICATIONS Applications."
            
            
            jiazha-mac:~ jiazha$ oc adm new-project jian 
            Created project jian 

            until the APIServer timeout.

            jiazha-mac:~ jiazha$ oc get applications -A -o json --chunk-size=0 > log1
            Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get applications.argoproj.io)
            jiazha-mac:~ jiazha$ cat log1
            {
                "apiVersion": "v1",
                "items": [],
                "kind": "List",
                "metadata": {
                    "resourceVersion": ""
                }
            }  

             

            3, Update the channel to `latest` to upgrade Gitops v1.7.4 to the latest. It worked well, and LGTM, verified it.

            jiazha-mac:~ jiazha$ oc edit sub openshift-gitops-operator 
            subscription.operators.coreos.com/openshift-gitops-operator edited
            
            jiazha-mac:~ jiazha$ oc get sub 
            NAME                        PACKAGE                     SOURCE             CHANNEL
            openshift-gitops-operator   openshift-gitops-operator   redhat-operators   latest
            
             jiazha-mac:~ jiazha$ oc get ip 
            NAME            CSV                                               APPROVAL    APPROVED
            install-6sv2h   openshift-gitops-operator.v1.7.4-0.1690486082.p   Automatic   true
            install-bt4qv   openshift-gitops-operator.v1.13.1                 Automatic   true
            jiazha-mac:~ jiazha$ oc get csv 
            NAME                                DISPLAY                    VERSION   REPLACES                                          PHASE
            openshift-gitops-operator.v1.13.1   Red Hat OpenShift GitOps   1.13.1    openshift-gitops-operator.v1.7.4-0.1690486082.p   Succeeded

            Jian Zhang added a comment - Test pass, verify it. 1, Install OCP 4.14 cluster(AWS, SNO, 8 CPU, 32G Memory) and Gitops operator 1.7.4, as follows,    launch 4.14,openshift/ operator -framework-olm#869 aws,single-node jiazha-mac:~ jiazha$ oc get clusterversion NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS version   4.14.0-0.test-2024-09-18-045200-ci-ln-ci835gb-latest   True        False         34m     Cluster version is 4.14.0-0.test-2024-09-18-045200-ci-ln-ci835gb-latest jiazha-mac:~ jiazha$ oc get node NAME                                      STATUS   ROLES                         AGE   VERSION ip-10-0-80-0.us-west-2.compute.internal   Ready    control-plane,master,worker   58m   v1.27.16+03a907c jiazha-mac:~ jiazha$ oc get sub -A NAMESPACE             NAME                        PACKAGE                     SOURCE             CHANNEL openshift-operators   openshift-gitops- operator   openshift-gitops- operator   redhat-operators   gitops-1.7 jiazha-mac:~ jiazha$ oc get csv -n openshift-operators NAME                                              DISPLAY                    VERSION                REPLACES                           PHASE openshift-gitops- operator .v1.7.4-0.1690486082.p   Red Hat OpenShift GitOps   1.7.4+0.1690486082.p   openshift-gitops- operator .v1.7.3   Succeeded jiazha-mac:~ jiazha$ oc get pods -n openshift-operators NAME                                                  READY   STATUS    RESTARTS   AGE gitops- operator -controller-manager-65847b9f7b-r7zwt   1/1     Running   0          47s 2, create 500.000 application CRs by using the script below.  jiazha-mac:~ jiazha$ cat create-app3.sh #!/bin/bash # Namespace where the Application resources will be created NAMESPACE= "openshift-operators" DEST_NAMESPACE= "jian" PROJECT_NAME= "jian" DEST_NAME= "test" # Number of Applications to create TOTAL_APPLICATIONS=200000 # Number of parallel jobs PARALLEL_JOBS=1000 # Function to generate a unique application YAML and apply it create_application() {   local app_name=$1   cat <<EOF | oc apply -f - apiVersion: argoproj.io/v1alpha1 kind: Application metadata:   name: ${app_name}   namespace: ${NAMESPACE} spec:   destination:     name: ${DEST_NAME}     namespace: ${DEST_NAMESPACE}   project: ${PROJECT_NAME}   source:     repoURL: https: //github.com/jianzhangbjz/learn- operator /tree/master/manifests EOF } # Main loop to create the specified number of applications in parallel count=0 for i in $(seq 1 $TOTAL_APPLICATIONS); do   app_name= "example2-${i}"   echo "Creating Application: ${app_name}"   create_application "${app_name}" &   count=$((count + 1))   # If we've reached the parallel jobs limit, wait for all jobs to complete   if [[ $count -ge $PARALLEL_JOBS ]]; then     wait     count=0   fi done # Wait for any remaining background jobs to complete wait echo "Successfully created $TOTAL_APPLICATIONS Applications." jiazha-mac:~ jiazha$ oc adm new -project jian Created project jian until the APIServer timeout. jiazha-mac:~ jiazha$ oc get applications -A -o json --chunk-size=0 > log1 Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get applications.argoproj.io) jiazha-mac:~ jiazha$ cat log1 {     "apiVersion" : "v1" ,     "items" : [],     "kind" : "List" ,     "metadata" : {         "resourceVersion" : ""     } }   3, Update the channel to `latest` to upgrade Gitops v1.7.4 to the latest. It worked well, and LGTM, verified it. jiazha-mac:~ jiazha$ oc edit sub openshift-gitops- operator   subscription.operators.coreos.com/openshift-gitops- operator edited jiazha-mac:~ jiazha$ oc get sub NAME                        PACKAGE                     SOURCE             CHANNEL openshift-gitops- operator   openshift-gitops- operator   redhat-operators   latest jiazha-mac:~ jiazha$ oc get ip NAME            CSV                                               APPROVAL    APPROVED install-6sv2h   openshift-gitops- operator .v1.7.4-0.1690486082.p   Automatic   true install-bt4qv   openshift-gitops- operator .v1.13.1                 Automatic   true jiazha-mac:~ jiazha$ oc get csv NAME                                DISPLAY                    VERSION   REPLACES                                          PHASE openshift-gitops- operator .v1.13.1   Red Hat OpenShift GitOps   1.13.1    openshift-gitops- operator .v1.7.4-0.1690486082.p   Succeeded

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: