Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Can't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.10
Component/s: Networking / router
Labels:
- migrated_from_bz

Severity:
Important
Story Points:
3
Sprint:
Sprint 212, Sprint 229, Sprint 235
sprint_count:
3
Release Blocker:
Rejected
Architecture:

Unspecified
Release Note Type:
If docs needed, set a value

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:
The cluster operator console/authentication shows degraded for about 6 minutes after updating ingresscontroller LB scope

OpenShift release version:
4.10.0-0.nightly-2021-12-21-130047

Cluster Platform:
AWS

How reproducible:
100%

Steps to Reproduce (in detail):
1. launch a cluster on AWS
2. change the LB scope:
$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"Internal"}}}}'

3. Check the message from "oc get co/ingress" and follow the instructions and delete the LB service.
$ oc -n openshift-ingress delete svc/router-default
service "router-default" deleted

4. check the status of cluster operators
$ oc get co

Actual results:
During the process of LB re-provision and DNS records refresh, co/console and authentication shows degraded for about 6 minutes. see:

$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.10.0-0.nightly-2021-12-21-130047 False False True 5m20s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-a22.qe.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
<--~~snip~~-->
console 4.10.0-0.nightly-2021-12-21-130047 False False False 5m24s RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host

1. 1. try more, after a while the authentication is avaible but console still shows degraded (6m6s)
    $ oc get co
    NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
    authentication 4.10.0-0.nightly-2021-12-21-130047 True False False 37s
    <--~~snip~~-->
    console 4.10.0-0.nightly-2021-12-21-130047 False False False 6m6s RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host

Expected results:
using nslookup to check the DNS record from outside cluster and find it can be refreshed within about 2 minutes, so co/console and authentication should not stay in Degraded status for such a long time.

Impact of the problem:
unfriendly user experience

Additional info:

- Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.

Attachments

Activity

People

Assignee:: Miciah Masters

Reporter:: Hongan Li

QA Contact:: Hongan Li

Contributing Groups:: Red Hat Employee

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021/12/22 7:43 AM

Updated:: 2023/04/26 12:16 AM

Resolved:: 2023/04/18 4:25 PM