Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-51357

[OLMv1] ClusterOperator OLM Degraded: Deployment was progressing too long

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.19.0
    • OLM
    • Important
    • None
    • Glaceon OLM Sprint 267
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      CO olm Degraded.

          jiazha-mac:~ jiazha$ omg get clusterversion
      NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
      version           False      True         1h7m   Unable to apply 4.19.0-0.nightly-multi-2025-02-26-050012: the cluster operator olm is not available
      
      jiazha-mac:~ jiazha$ omg get co olm -o yaml
      apiVersion: config.openshift.io/v1
      kind: ClusterOperator
      metadata:
      ...
      spec: {}
      status:
        conditions:
        - lastTransitionTime: '2025-02-26T16:25:34Z'
          message: 'CatalogdDeploymentCatalogdControllerManagerDegraded: Deployment was
            progressing too long
      
      
            OperatorcontrollerDeploymentOperatorControllerControllerManagerDegraded: Deployment
            was progressing too long'
          reason: CatalogdDeploymentCatalogdControllerManager_SyncError::OperatorcontrollerDeploymentOperatorControllerControllerManager_SyncError
          status: 'True'
          type: Degraded
        - lastTransitionTime: '2025-02-26T16:08:34Z'
          message: 'CatalogdDeploymentCatalogdControllerManagerProgressing: Waiting for
            Deployment to deploy pods
      
      
            OperatorcontrollerDeploymentOperatorControllerControllerManagerProgressing:
            Waiting for Deployment to deploy pods'
          reason: CatalogdDeploymentCatalogdControllerManager_Deploying::OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying
          status: 'True'
          type: Progressing
        - lastTransitionTime: '2025-02-26T16:08:34Z'
          message: 'CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment
      
      
            OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting
            for Deployment'
          reason: CatalogdDeploymentCatalogdControllerManager_Deploying::OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying
          status: 'False'
          type: Available

      However, the `catalogd` and `operator-controller` deployment worked well at that time.

      jiazha-mac:~ jiazha$ omg get deploy 
      NAME                         READY  UP-TO-DATE  AVAILABLE  AGE
      catalogd-controller-manager  1/1    1           1          1h1m
      jiazha-mac:~ jiazha$ omg get deploy -n openshift-operator-controller 
      NAME                                    READY  UP-TO-DATE  AVAILABLE  AGE
      operator-controller-controller-manager  1/1    1           1          1h1m
      
      jiazha-mac:~ jiazha$ omg get deploy catalogd-controller-manager -o yaml
      apiVersion: apps/v1
      kind: Deployment
      ...
      status:
        availableReplicas: '1'
        conditions:
        - lastTransitionTime: '2025-02-26T16:24:35Z'
          lastUpdateTime: '2025-02-26T16:24:35Z'
          message: Deployment has minimum availability.
          reason: MinimumReplicasAvailable
          status: 'True'
          type: Available
        - lastTransitionTime: '2025-02-26T16:22:42Z'
          lastUpdateTime: '2025-02-26T16:24:35Z'
          message: ReplicaSet "catalogd-controller-manager-7f855d8d48" has successfully
            progressed.
          reason: NewReplicaSetAvailable
          status: 'True'
          type: Progressing
        observedGeneration: '1'
        readyReplicas: '1'
        replicas: '1'
        updatedReplicas: '1'
      
      jiazha-mac:~ jiazha$ omg get deploy -n openshift-operator-controller  operator-controller-controller-manager -o yaml
      apiVersion: apps/v1
      kind: Deployment
      ...
      status:
        availableReplicas: '1'
        conditions:
        - lastTransitionTime: '2025-02-26T16:23:49Z'
          lastUpdateTime: '2025-02-26T16:23:49Z'
          message: Deployment has minimum availability.
          reason: MinimumReplicasAvailable
          status: 'True'
          type: Available
        - lastTransitionTime: '2025-02-26T16:22:54Z'
          lastUpdateTime: '2025-02-26T16:23:49Z'
          message: ReplicaSet "operator-controller-controller-manager-57f648fb64" has successfully
            progressed.
          reason: NewReplicaSetAvailable
          status: 'True'
          type: Progressing
        observedGeneration: '1'
        readyReplicas: '1'
        replicas: '1'
        updatedReplicas: '1'

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Not always

      Steps to Reproduce:

      encountered this issues twice:

      1, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-arm-mixarch-f14/1894774434611335168 

      2, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-amd-mixarch-f28-destructive/1894774064451424256 

          1.
          2.
          3.
          

      Actual results:

          CO olm Degraded.

      Expected results:

          CO olm availabel.

      Additional info:

          jiazha-mac:~ jiazha$ omg project openshift-cluster-olm-operator
      Now using project openshift-cluster-olm-operator
      jiazha-mac:~ jiazha$ omg get pods 
      NAME                                   READY  STATUS   RESTARTS  AGE
      cluster-olm-operator-5c6b8c4959-swxtt  0/1    Running  0         38m
      jiazha-mac:~ jiazha$ omg logs cluster-olm-operator-5c6b8c4959-swxtt -c cluster-olm-operator
      2025-02-26T16:31:52.648371813Z I0226 16:31:52.643085       1 cmd.go:253] Using service-serving-cert provided certificates
      2025-02-26T16:31:52.648662533Z I0226 16:31:52.648619       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
      ...
      2025-02-26T16:32:05.467351366Z E0226 16:32:05.467298       1 base_controller.go:279] "Unhandled Error" err="CatalogdDeploymentCatalogdControllerManager reconciliation failed: Deployment was progressing too long"
      2025-02-26T16:32:06.059681614Z I0226 16:32:06.059629       1 builder.go:224] "ProxyHook updating environment" logger="builder" deployment="operator-controller-controller-manager"
      2025-02-26T16:32:06.059769494Z I0226 16:32:06.059758       1 featuregates_hook.go:33] "updating environment" logger="feature_gates_hook" deployment="operator-controller-controller-manager"
      2025-02-26T16:32:06.066149493Z E0226 16:32:06.066095       1 base_controller.go:279] "Unhandled Error" err="OperatorcontrollerDeploymentOperatorControllerControllerManager reconciliation failed: Deployment was progressing too long"

              tshort@redhat.com Todd Short
              rhn-support-jiazha Jian Zhang
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: