Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.19.0
Component/s: OLM
Labels:
- olmv1
- triaged

Severity:
Important
Regression:
None
Sprint:
Glaceon OLM Sprint 267
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.19.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

CO olm Degraded.

    jiazha-mac:~ jiazha$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           False      True         1h7m   Unable to apply 4.19.0-0.nightly-multi-2025-02-26-050012: the cluster operator olm is not available

jiazha-mac:~ jiazha$ omg get co olm -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
...
spec: {}
status:
  conditions:
  - lastTransitionTime: '2025-02-26T16:25:34Z'
    message: 'CatalogdDeploymentCatalogdControllerManagerDegraded: Deployment was
      progressing too long


      OperatorcontrollerDeploymentOperatorControllerControllerManagerDegraded: Deployment
      was progressing too long'
    reason: CatalogdDeploymentCatalogdControllerManager_SyncError::OperatorcontrollerDeploymentOperatorControllerControllerManager_SyncError
    status: 'True'
    type: Degraded
  - lastTransitionTime: '2025-02-26T16:08:34Z'
    message: 'CatalogdDeploymentCatalogdControllerManagerProgressing: Waiting for
      Deployment to deploy pods


      OperatorcontrollerDeploymentOperatorControllerControllerManagerProgressing:
      Waiting for Deployment to deploy pods'
    reason: CatalogdDeploymentCatalogdControllerManager_Deploying::OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2025-02-26T16:08:34Z'
    message: 'CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment


      OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting
      for Deployment'
    reason: CatalogdDeploymentCatalogdControllerManager_Deploying::OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying
    status: 'False'
    type: Available

However, the `catalogd` and `operator-controller` deployment worked well at that time.

jiazha-mac:~ jiazha$ omg get deploy 
NAME                         READY  UP-TO-DATE  AVAILABLE  AGE
catalogd-controller-manager  1/1    1           1          1h1m
jiazha-mac:~ jiazha$ omg get deploy -n openshift-operator-controller 
NAME                                    READY  UP-TO-DATE  AVAILABLE  AGE
operator-controller-controller-manager  1/1    1           1          1h1m

jiazha-mac:~ jiazha$ omg get deploy catalogd-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
...
status:
  availableReplicas: '1'
  conditions:
  - lastTransitionTime: '2025-02-26T16:24:35Z'
    lastUpdateTime: '2025-02-26T16:24:35Z'
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: 'True'
    type: Available
  - lastTransitionTime: '2025-02-26T16:22:42Z'
    lastUpdateTime: '2025-02-26T16:24:35Z'
    message: ReplicaSet "catalogd-controller-manager-7f855d8d48" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: 'True'
    type: Progressing
  observedGeneration: '1'
  readyReplicas: '1'
  replicas: '1'
  updatedReplicas: '1'

jiazha-mac:~ jiazha$ omg get deploy -n openshift-operator-controller  operator-controller-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
...
status:
  availableReplicas: '1'
  conditions:
  - lastTransitionTime: '2025-02-26T16:23:49Z'
    lastUpdateTime: '2025-02-26T16:23:49Z'
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: 'True'
    type: Available
  - lastTransitionTime: '2025-02-26T16:22:54Z'
    lastUpdateTime: '2025-02-26T16:23:49Z'
    message: ReplicaSet "operator-controller-controller-manager-57f648fb64" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: 'True'
    type: Progressing
  observedGeneration: '1'
  readyReplicas: '1'
  replicas: '1'
  updatedReplicas: '1'

Version-Release number of selected component (if applicable):

How reproducible:

    Not always

Steps to Reproduce:

encountered this issues twice:

1, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-arm-mixarch-f14/1894774434611335168

2, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-amd-mixarch-f28-destructive/1894774064451424256

    1.
    2.
    3.

Actual results:

    CO olm Degraded.

Expected results:

    CO olm availabel.

Additional info:

    jiazha-mac:~ jiazha$ omg project openshift-cluster-olm-operator
Now using project openshift-cluster-olm-operator
jiazha-mac:~ jiazha$ omg get pods 
NAME                                   READY  STATUS   RESTARTS  AGE
cluster-olm-operator-5c6b8c4959-swxtt  0/1    Running  0         38m
jiazha-mac:~ jiazha$ omg logs cluster-olm-operator-5c6b8c4959-swxtt -c cluster-olm-operator
2025-02-26T16:31:52.648371813Z I0226 16:31:52.643085       1 cmd.go:253] Using service-serving-cert provided certificates
2025-02-26T16:31:52.648662533Z I0226 16:31:52.648619       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
...
2025-02-26T16:32:05.467351366Z E0226 16:32:05.467298       1 base_controller.go:279] "Unhandled Error" err="CatalogdDeploymentCatalogdControllerManager reconciliation failed: Deployment was progressing too long"
2025-02-26T16:32:06.059681614Z I0226 16:32:06.059629       1 builder.go:224] "ProxyHook updating environment" logger="builder" deployment="operator-controller-controller-manager"
2025-02-26T16:32:06.059769494Z I0226 16:32:06.059758       1 featuregates_hook.go:33] "updating environment" logger="feature_gates_hook" deployment="operator-controller-controller-manager"
2025-02-26T16:32:06.066149493Z E0226 16:32:06.066095       1 base_controller.go:279] "Unhandled Error" err="OperatorcontrollerDeploymentOperatorControllerControllerManager reconciliation failed: Deployment was progressing too long"

links to

openshift/cluster-olm-operator#106: OCPBUGS-51357: Revert "(vendor) Bump openshift/api"

openshift/library-go#1940: OCPBUGS-51357: Only check deployment time when remaining in Progressing state

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

Assignee:: Todd Short

Reporter:: Jian Zhang

QA Contact:: Jian Zhang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/02/27 7:39 AM

Updated:: 2025/03/11 6:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates