-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.15
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The issue is firstly reported in https://issues.redhat.com/browse/HIVE-2390, later we see similar issue in QE CI installation The problem is: console pod is running well, console is behaving correctly(user is able to visit console in browser and login, pages are loading well) however console operator keeps Degraded: True for quite long time, although finally console operator will report correct status it still block installation from succeed, and seems related with ingress restarts
Version-Release number of selected component (if applicable):
4.15
How reproducible:
frequently in Hive CI and QE CI
Steps to Reproduce:
Search 'clusterOperator=console condition=Degraded' in following two CI logs https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1739497565449097216/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1736960825022746624/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log
Actual results:
firstly, console operator is reporting Degraded: True due to route not admitted, however even after route can be successfully accessed without any errors, console operator still reports Degraded: True console operators takes about 3 hours to resume(report correct status) I1227 07:03:11.626408 1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from False to True ("RouteHealthDegraded: console route is not admitted") I1227 09:56:03.027887 1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from True to False ("All is well"),Available changed from False to True ("All is well")
Expected results:
console operator should resume/come up in acceptable time
Additional info:
In another case, co/console is reporting $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.15.0-0.nightly-2023-12-25-100326 True False False 102m baremetal 4.15.0-0.nightly-2023-12-25-100326 True False False 112m cloud-controller-manager 4.15.0-0.nightly-2023-12-25-100326 True False False 112m cloud-credential 4.15.0-0.nightly-2023-12-25-100326 True False False 108m cluster-autoscaler 4.15.0-0.nightly-2023-12-25-100326 True False False 108m config-operator 4.15.0-0.nightly-2023-12-25-100326 True False False 115m console 4.15.0-0.nightly-2023-12-25-100326 False True True 105m DeploymentAvailable: 0 replicas available for console deployment... control-plane-machine-set 4.15.0-0.nightly-2023-12-25-100326 True False False 114m csi-snapshot-controller 4.15.0-0.nightly-2023-12-25-100326 True False False 115m dns 4.15.0-0.nightly-2023-12-25-100326 True False False 114m etcd 4.15.0-0.nightly-2023-12-25-100326 True False False 111m image-registry 4.15.0-0.nightly-2023-12-25-100326 True False False 102m ingress 4.15.0-0.nightly-2023-12-25-100326 True False False 114m $ oc describe co console ..... Status: Conditions: Last Transition Time: 2024-01-03T07:21:09Z Message: RouteHealthDegraded: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable' Reason: RouteHealth_StatusError Status: True Type: Degraded Last Transition Time: 2024-01-03T07:16:54Z Message: SyncLoopRefreshProgressing: Working toward version 4.15.0-0.nightly-2023-12-25-100326, 0 replicas available Reason: SyncLoopRefresh_InProgress Status: True Type: Progressing Last Transition Time: 2024-01-03T07:16:54Z Message: DeploymentAvailable: 0 replicas available for console deployment RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable' Reason: Deployment_InsufficientReplicas::RouteHealth_StatusError Status: False Type: Available Last Transition Time: 2024-01-03T07:16:58Z Message: All is well Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> $ $ oc get all -n openshift-console Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/console-7fcf8c4bdc-qrxr2 1/1 Running 0 102m pod/downloads-7984f9cc88-rq4r6 1/1 Running 0 107mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/console ClusterIP 172.30.68.4 <none> 443/TCP 107m service/downloads ClusterIP 172.30.163.136 <none> 80/TCP 107mNAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/console 1/1 1 1 106m deployment.apps/downloads 1/1 1 1 107mNAME DESIRED CURRENT READY AGE replicaset.apps/console-668fd67cc5 0 0 0 106m replicaset.apps/console-767d8974fc 0 0 0 103m replicaset.apps/console-7fcf8c4bdc 1 1 1 102m replicaset.apps/downloads-7984f9cc88 1 1 1 107mNAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/console console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com console https reencrypt/Redirect None route.route.openshift.io/downloads downloads-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com downloads http edge/Redirect None $ oc -n openshift-ingress rsh router-default-cf4dcdf75-jpt7x sh-4.4$ curl -kI https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com HTTP/1.1 200 OK referrer-policy: strict-origin-when-cross-origin set-cookie: csrf-token=kA+FhBTs3kQd20L4v2bPO14+SEzfZwib8G+k32nmxnKrdtujqjy+xOBMjlTah5xfWKarHiMNZYFk6Od90MQV8A==; Path=/; Secure; SameSite=Lax x-content-type-options: nosniff x-dns-prefetch-control: off x-frame-options: DENY x-xss-protection: 1; mode=block date: Wed, 03 Jan 2024 09:05:24 GMT content-type: text/html; charset=utf-8 set-cookie: 1e2670d92730b515ce3a1bb65da45062=e48473f300ccce6c56c4108e066d07a1; path=/; HttpOnly; Secure; SameSite=None $ oc get pods -n openshift-ingress NAME READY STATUS RESTARTS AGE router-default-cf4dcdf75-jpt7x 1/1 Running 2 (97m ago) 103m $ oc get pods -n openshift-console-operator NAME READY STATUS RESTARTS AGE console-operator-67c9d788b6-lk58j 2/2 Running 6 (123m ago) 123m
- is related to
-
HIVE-2390 console ClusterOperator fails to resume from hibernation
-
- Closed
-