Details
-
Bug
-
Resolution: Can't Do
-
Major
-
None
-
4.10
-
Important
-
3
-
Sprint 212, Sprint 229, Sprint 235
-
3
-
Rejected
-
Unspecified
-
If docs needed, set a value
Description
Description of problem:
The cluster operator console/authentication shows degraded for about 6 minutes after updating ingresscontroller LB scope
OpenShift release version:
4.10.0-0.nightly-2021-12-21-130047
Cluster Platform:
AWS
How reproducible:
100%
Steps to Reproduce (in detail):
1. launch a cluster on AWS
2. change the LB scope:
$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"Internal"}}}}'
3. Check the message from "oc get co/ingress" and follow the instructions and delete the LB service.
$ oc -n openshift-ingress delete svc/router-default
service "router-default" deleted
4. check the status of cluster operators
$ oc get co
Actual results:
During the process of LB re-provision and DNS records refresh, co/console and authentication shows degraded for about 6 minutes. see:
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.10.0-0.nightly-2021-12-21-130047 False False True 5m20s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-a22.qe.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
<--snip-->
console 4.10.0-0.nightly-2021-12-21-130047 False False False 5m24s RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host
-
-
- try more, after a while the authentication is avaible but console still shows degraded (6m6s)
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.10.0-0.nightly-2021-12-21-130047 True False False 37s
<--snip-->
console 4.10.0-0.nightly-2021-12-21-130047 False False False 6m6s RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.hongli-a22.qe.devcluster.openshift.com on 172.30.0.10:53: no such host
- try more, after a while the authentication is avaible but console still shows degraded (6m6s)
-
Expected results:
using nslookup to check the DNS record from outside cluster and find it can be refreshed within about 2 minutes, so co/console and authentication should not stay in Degraded status for such a long time.
Impact of the problem:
unfriendly user experience
Additional info:
-
- Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.