Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.15
Component/s: Management Console
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The issue is firstly reported in https://issues.redhat.com/browse/HIVE-2390, later we see similar issue in QE CI installation

The problem is: console pod is running well, console is behaving correctly(user is able to visit console in browser and login, pages are loading well) however console operator keeps Degraded: True for quite long time, although finally console operator will report correct status it still block installation from succeed, and seems related with ingress restarts

Version-Release number of selected component (if applicable):

 4.15

How reproducible:

frequently in Hive CI and QE CI

Steps to Reproduce:

Search 'clusterOperator=console condition=Degraded' in following two CI logs

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1739497565449097216/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log 

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1736960825022746624/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log

Actual results:

firstly, console operator is reporting Degraded: True due to route not admitted, however even after route can be successfully accessed without any errors, console operator still reports Degraded: True

console operators takes about 3 hours to resume(report correct status)

I1227 07:03:11.626408       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from False to True ("RouteHealthDegraded: console route is not admitted")


I1227 09:56:03.027887       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from True to False ("All is well"),Available changed from False to True ("All is well")

Expected results:

console operator should resume/come up in acceptable time

Additional info:

In another case, co/console is reporting 
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-12-25-100326   True        False         False      102m    
baremetal                                  4.15.0-0.nightly-2023-12-25-100326   True        False         False      112m    
cloud-controller-manager                   4.15.0-0.nightly-2023-12-25-100326   True        False         False      112m    
cloud-credential                           4.15.0-0.nightly-2023-12-25-100326   True        False         False      108m    
cluster-autoscaler                         4.15.0-0.nightly-2023-12-25-100326   True        False         False      108m    
config-operator                            4.15.0-0.nightly-2023-12-25-100326   True        False         False      115m    
console                                    4.15.0-0.nightly-2023-12-25-100326   False       True          True       105m    DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    
csi-snapshot-controller                    4.15.0-0.nightly-2023-12-25-100326   True        False         False      115m    
dns                                        4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    
etcd                                       4.15.0-0.nightly-2023-12-25-100326   True        False         False      111m    
image-registry                             4.15.0-0.nightly-2023-12-25-100326   True        False         False      102m    
ingress                                    4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    

$ oc describe co console
.....
Status:
  Conditions:
    Last Transition Time:  2024-01-03T07:21:09Z
    Message:               RouteHealthDegraded: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable'
    Reason:                RouteHealth_StatusError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2024-01-03T07:16:54Z
    Message:               SyncLoopRefreshProgressing: Working toward version 4.15.0-0.nightly-2023-12-25-100326, 0 replicas available
    Reason:                SyncLoopRefresh_InProgress
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2024-01-03T07:16:54Z
    Message:               DeploymentAvailable: 0 replicas available for console deployment
RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable'
    Reason:                Deployment_InsufficientReplicas::RouteHealth_StatusError
    Status:                False
    Type:                  Available
    Last Transition Time:  2024-01-03T07:16:58Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>

$ $ oc get all -n openshift-console
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME                             READY   STATUS    RESTARTS   AGE
pod/console-7fcf8c4bdc-qrxr2     1/1     Running   0          102m
pod/downloads-7984f9cc88-rq4r6   1/1     Running   0          107mNAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/console     ClusterIP   172.30.68.4      <none>        443/TCP   107m
service/downloads   ClusterIP   172.30.163.136   <none>        80/TCP    107mNAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/console     1/1     1            1           106m
deployment.apps/downloads   1/1     1            1           107mNAME                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/console-668fd67cc5     0         0         0       106m
replicaset.apps/console-767d8974fc     0         0         0       103m
replicaset.apps/console-7fcf8c4bdc     1         1         1       102m
replicaset.apps/downloads-7984f9cc88   1         1         1       107mNAME                                 HOST/PORT                                                                      PATH   SERVICES    PORT    TERMINATION          WILDCARD
route.route.openshift.io/console     console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com            console     https   reencrypt/Redirect   None
route.route.openshift.io/downloads   downloads-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com          downloads   http    edge/Redirect        None 
$ oc -n openshift-ingress rsh router-default-cf4dcdf75-jpt7x 
sh-4.4$ curl -kI https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com
HTTP/1.1 200 OK
referrer-policy: strict-origin-when-cross-origin
set-cookie: csrf-token=kA+FhBTs3kQd20L4v2bPO14+SEzfZwib8G+k32nmxnKrdtujqjy+xOBMjlTah5xfWKarHiMNZYFk6Od90MQV8A==; Path=/; Secure; SameSite=Lax
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-frame-options: DENY
x-xss-protection: 1; mode=block
date: Wed, 03 Jan 2024 09:05:24 GMT
content-type: text/html; charset=utf-8
set-cookie: 1e2670d92730b515ce3a1bb65da45062=e48473f300ccce6c56c4108e066d07a1; path=/; HttpOnly; Secure; SameSite=None

$ oc get pods -n openshift-ingress                                        
NAME                             READY   STATUS    RESTARTS      AGE
router-default-cf4dcdf75-jpt7x   1/1     Running   2 (97m ago)   103m
$ oc get pods -n openshift-console-operator                  
NAME                                READY   STATUS    RESTARTS       AGE
console-operator-67c9d788b6-lk58j   2/2     Running   6 (123m ago)   123m

is related to

HIVE-2390 console ClusterOperator fails to resume from hibernation

Closed

Assignee:: Jakub Hadvig

Reporter:: YaDan Pei

Need Info From:: None

Contributors:: None

QA Contact:: YaDan Pei

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/01/03 2:11 AM

Updated:: 2025/07/24 11:34 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide