-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.12
-
Low
-
None
-
False
-
Description of problem:
Console CO degraded after 2 days, with this message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"
How reproducible:
Reproducible on all clusters with the 4.12.0-rc builds. https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/ Note: 1. This issue is seen on 4.12.0-rc clusters (rc.0, rc.1, and rc.2) with master nodes deployed with 16GB RAM. 2. 32GB RAM clusters are running fine 3. Post-deployment, all COs are good, but the console CO gets degraded after 3-4 days 4. Issue is not seen with ec builds ( we have a good ec.5 cluster which is running for a month)
Steps to Reproduce:
1. Deploy OCP cluster using 4.12.0-rc build on Power Platform (ppc64le) 2. Check the status of all the nodes, co, and pods. 3. Monitor the cluster for a few days.
Actual results:
"console" co goes to a degraded state. Error message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"
Expected results:
The cluster should stay stable.
Additional info:
[root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-rc.2 True False False 4d9h baremetal 4.12.0-rc.2 True False False 4d23h cloud-controller-manager 4.12.0-rc.2 True False False 4d23h cloud-credential 4.12.0-rc.2 True False False 4d23h cluster-autoscaler 4.12.0-rc.2 True False False 4d23h config-operator 4.12.0-rc.2 True False False 4d23h console 4.12.0-rc.2 True False True 4d8h ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded control-plane-machine-set 4.12.0-rc.2 True False False 4d23h csi-snapshot-controller 4.12.0-rc.2 True False False 4d23h dns 4.12.0-rc.2 True False False 4d23h etcd 4.12.0-rc.2 True False False 4d23h image-registry 4.12.0-rc.2 True False False 4d22h ingress 4.12.0-rc.2 True False False 4d23h insights 4.12.0-rc.2 True False False 4d23h kube-apiserver 4.12.0-rc.2 True False False 4d23h kube-controller-manager 4.12.0-rc.2 True False False 4d23h kube-scheduler 4.12.0-rc.2 True False False 4d23h kube-storage-version-migrator 4.12.0-rc.2 True False False 4d23h machine-api 4.12.0-rc.2 True False False 4d23h machine-approver 4.12.0-rc.2 True False False 4d23h machine-config 4.12.0-rc.2 True False False 4d8h marketplace 4.12.0-rc.2 True False False 4d23h monitoring 4.12.0-rc.2 True False False 4d8h network 4.12.0-rc.2 True False False 4d23h node-tuning 4.12.0-rc.2 True False False 4d23h openshift-apiserver 4.12.0-rc.2 True False False 4d10h openshift-controller-manager 4.12.0-rc.2 True False False 4d23h openshift-samples 4.12.0-rc.2 True False False 4d23h operator-lifecycle-manager 4.12.0-rc.2 True False False 4d23h operator-lifecycle-manager-catalog 4.12.0-rc.2 True False False 4d23h operator-lifecycle-manager-packageserver 4.12.0-rc.2 True False False 4d10h service-ca 4.12.0-rc.2 True False False 4d23h storage 4.12.0-rc.2 True False False 4d23h Describe the co: [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc describe co console Name: console Namespace: Labels: <none> Annotations: capability.openshift.io/name: Console include.release.openshift.io/ibm-cloud-managed: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2022-11-30T07:21:43Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:capability.openshift.io/name: f:include.release.openshift.io/ibm-cloud-managed: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:ownerReferences: .: k:{"uid":"e278aa0a-5a0f-41e2-9a72-5c6461685d3a"}: f:spec: Manager: cluster-version-operator Operation: Update Time: 2022-11-30T07:21:43Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:extension: Manager: cluster-version-operator Operation: Update Subresource: status Time: 2022-11-30T07:21:43Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:relatedObjects: f:versions: Manager: console Operation: Update Subresource: status Time: 2022-11-30T21:58:45Z Owner References: API Version: config.openshift.io/v1 Kind: ClusterVersion Name: version UID: e278aa0a-5a0f-41e2-9a72-5c6461685d3a Resource Version: 334737 UID: 5254b11b-2495-4af9-8d29-65b47d4aea5b Spec: Status: Conditions: Last Transition Time: 2022-11-30T18:43:49Z Message: ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded Reason: ConsoleNotificationSync_FailedDelete Status: True Type: Degraded Last Transition Time: 2022-11-30T07:56:27Z Message: All is well Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2022-11-30T21:58:34Z Message: All is well Reason: AsExpected Status: True Type: Available Last Transition Time: 2022-11-30T07:42:45Z Message: All is well Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: proxies Group: config.openshift.io Name: cluster Resource: oauths Group: oauth.openshift.io Name: console Resource: oauthclients Group: Name: openshift-console-operator Resource: namespaces Group: Name: openshift-console Resource: namespaces Group: Name: console-public Namespace: openshift-config-managed Resource: configmaps Versions: Name: operator Version: 4.12.0-rc.2 Events: <none> The console operator does not seem to come out of degraded state, even though I have deleted the pod and there are no issues currently. [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console NAME READY STATUS RESTARTS AGE console-5fc9cfd8ff-qqqjm 1/1 Running 0 5d21h console-5fc9cfd8ff-vlwvs 1/1 Running 0 6d11h downloads-6669fd985f-4fvt4 1/1 Running 0 6d12h downloads-6669fd985f-859d2 1/1 Running 0 6d12h [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console-operator NAME READY STATUS RESTARTS AGE console-operator-55fc8d8ff5-nzrxn 2/2 Running 0 46m And yet, the status still shows as degraded. root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co |grep console console 4.12.0-rc.2 True False True 5d21h ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded must-gather log and cronjob logs are attached here, cron log captures the metrics from the cluster nodes. https://drive.google.com/drive/folders/1qUsLxCd2ta-dD7gtevD96s95oABfzCvY?usp=sharing