-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.12
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Console CO degraded after 2 days, with this message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"
How reproducible:
Reproducible on all clusters with the 4.12.0-rc builds. https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/ Note: 1. This issue is seen on 4.12.0-rc clusters (rc.0, rc.1, and rc.2) with master nodes deployed with 16GB RAM. 2. 32GB RAM clusters are running fine 3. Post-deployment, all COs are good, but the console CO gets degraded after 3-4 days 4. Issue is not seen with ec builds ( we have a good ec.5 cluster which is running for a month)
Steps to Reproduce:
1. Deploy OCP cluster using 4.12.0-rc build on Power Platform (ppc64le) 2. Check the status of all the nodes, co, and pods. 3. Monitor the cluster for a few days.
Actual results:
"console" co goes to a degraded state. Error message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"
Expected results:
The cluster should stay stable.
Additional info:
[root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.12.0-rc.2 True False False 4d9h
baremetal 4.12.0-rc.2 True False False 4d23h
cloud-controller-manager 4.12.0-rc.2 True False False 4d23h
cloud-credential 4.12.0-rc.2 True False False 4d23h
cluster-autoscaler 4.12.0-rc.2 True False False 4d23h
config-operator 4.12.0-rc.2 True False False 4d23h
console 4.12.0-rc.2 True False True 4d8h ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
control-plane-machine-set 4.12.0-rc.2 True False False 4d23h
csi-snapshot-controller 4.12.0-rc.2 True False False 4d23h
dns 4.12.0-rc.2 True False False 4d23h
etcd 4.12.0-rc.2 True False False 4d23h
image-registry 4.12.0-rc.2 True False False 4d22h
ingress 4.12.0-rc.2 True False False 4d23h
insights 4.12.0-rc.2 True False False 4d23h
kube-apiserver 4.12.0-rc.2 True False False 4d23h
kube-controller-manager 4.12.0-rc.2 True False False 4d23h
kube-scheduler 4.12.0-rc.2 True False False 4d23h
kube-storage-version-migrator 4.12.0-rc.2 True False False 4d23h
machine-api 4.12.0-rc.2 True False False 4d23h
machine-approver 4.12.0-rc.2 True False False 4d23h
machine-config 4.12.0-rc.2 True False False 4d8h
marketplace 4.12.0-rc.2 True False False 4d23h
monitoring 4.12.0-rc.2 True False False 4d8h
network 4.12.0-rc.2 True False False 4d23h
node-tuning 4.12.0-rc.2 True False False 4d23h
openshift-apiserver 4.12.0-rc.2 True False False 4d10h
openshift-controller-manager 4.12.0-rc.2 True False False 4d23h
openshift-samples 4.12.0-rc.2 True False False 4d23h
operator-lifecycle-manager 4.12.0-rc.2 True False False 4d23h
operator-lifecycle-manager-catalog 4.12.0-rc.2 True False False 4d23h
operator-lifecycle-manager-packageserver 4.12.0-rc.2 True False False 4d10h
service-ca 4.12.0-rc.2 True False False 4d23h
storage 4.12.0-rc.2 True False False 4d23h
Describe the co:
[root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc describe co console
Name: console
Namespace:
Labels: <none>
Annotations: capability.openshift.io/name: Console
include.release.openshift.io/ibm-cloud-managed: true
include.release.openshift.io/self-managed-high-availability: true
include.release.openshift.io/single-node-developer: true
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2022-11-30T07:21:43Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:capability.openshift.io/name:
f:include.release.openshift.io/ibm-cloud-managed:
f:include.release.openshift.io/self-managed-high-availability:
f:include.release.openshift.io/single-node-developer:
f:ownerReferences:
.:
k:{"uid":"e278aa0a-5a0f-41e2-9a72-5c6461685d3a"}:
f:spec:
Manager: cluster-version-operator
Operation: Update
Time: 2022-11-30T07:21:43Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:extension:
Manager: cluster-version-operator
Operation: Update
Subresource: status
Time: 2022-11-30T07:21:43Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:relatedObjects:
f:versions:
Manager: console
Operation: Update
Subresource: status
Time: 2022-11-30T21:58:45Z
Owner References:
API Version: config.openshift.io/v1
Kind: ClusterVersion
Name: version
UID: e278aa0a-5a0f-41e2-9a72-5c6461685d3a
Resource Version: 334737
UID: 5254b11b-2495-4af9-8d29-65b47d4aea5b
Spec:
Status:
Conditions:
Last Transition Time: 2022-11-30T18:43:49Z
Message: ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
Reason: ConsoleNotificationSync_FailedDelete
Status: True
Type: Degraded
Last Transition Time: 2022-11-30T07:56:27Z
Message: All is well
Reason: AsExpected
Status: False
Type: Progressing
Last Transition Time: 2022-11-30T21:58:34Z
Message: All is well
Reason: AsExpected
Status: True
Type: Available
Last Transition Time: 2022-11-30T07:42:45Z
Message: All is well
Reason: AsExpected
Status: True
Type: Upgradeable
Extension: <nil>
Related Objects:
Group: operator.openshift.io
Name: cluster
Resource: consoles
Group: config.openshift.io
Name: cluster
Resource: consoles
Group: config.openshift.io
Name: cluster
Resource: infrastructures
Group: config.openshift.io
Name: cluster
Resource: proxies
Group: config.openshift.io
Name: cluster
Resource: oauths
Group: oauth.openshift.io
Name: console
Resource: oauthclients
Group:
Name: openshift-console-operator
Resource: namespaces
Group:
Name: openshift-console
Resource: namespaces
Group:
Name: console-public
Namespace: openshift-config-managed
Resource: configmaps
Versions:
Name: operator
Version: 4.12.0-rc.2
Events: <none>
The console operator does not seem to come out of degraded state, even though I have deleted the pod and there are no issues currently.
[root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console
NAME READY STATUS RESTARTS AGE
console-5fc9cfd8ff-qqqjm 1/1 Running 0 5d21h
console-5fc9cfd8ff-vlwvs 1/1 Running 0 6d11h
downloads-6669fd985f-4fvt4 1/1 Running 0 6d12h
downloads-6669fd985f-859d2 1/1 Running 0 6d12h
[root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console-operator
NAME READY STATUS RESTARTS AGE
console-operator-55fc8d8ff5-nzrxn 2/2 Running 0 46m
And yet, the status still shows as degraded.
root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co |grep console
console 4.12.0-rc.2 True False True 5d21h ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
must-gather log and cronjob logs are attached here, cron log captures the metrics from the cluster nodes.
https://drive.google.com/drive/folders/1qUsLxCd2ta-dD7gtevD96s95oABfzCvY?usp=sharing