-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.16.0
-
Important
-
No
-
1
-
NE Sprint 259
-
1
-
Rejected
-
False
-
Description of problem:
Canary route checks for the default ingress controller are failing but co/ingress is still available
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-multi-2024-06-06-113832
How reproducible:
sometimes
Steps to Reproduce:
1. after cluster ready, manually update the DNS for *.apps to wrong IP address. 2. check the ingresscontroller and cluster operator status 3.
Actual results:
1. Canary route checks for the default ingress controller are failing, but "type: Degraded" is still "False" $ oc -n openshift-ingress-operator get ingresscontroller/default -oyaml <...snip...> - lastTransitionTime: "2024-06-07T01:51:40Z" message: No DNS zones are defined in the cluster dns config. reason: NoDNSZones status: "False" type: DNSManaged - lastTransitionTime: "2024-06-07T06:42:11Z" status: "True" type: Available - lastTransitionTime: "2024-06-07T06:55:30Z" status: "False" type: Progressing - lastTransitionTime: "2024-06-07T10:10:41Z" status: "False" type: Degraded <.....> - lastTransitionTime: "2024-06-07T10:10:41Z" message: |- Canary route checks for the default ingress controller are failing. Last 2 error messages: error sending canary HTTP Request: Timeout: Get "https://canary-openshift-ingress-canary.apps.ci-op-kj4wsh12-1c975.qe.gcp.devcluster.openshift.com": dial tcp 10.0.32.107:443: i/o timeout (Client.Timeout exceeded while awaiting headers) (x8 over 2h58m35s) error sending canary HTTP Request: Timeout: Get "https://canary-openshift-ingress-canary.apps.ci-op-kj4wsh12-1c975.qe.gcp.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (x171 over 3h28m47s) reason: CanaryChecksRepetitiveFailures status: "False" type: CanaryChecksSucceeding 2. co/console and authentication are degraded, but ingress is still available $ oc get co console authentication ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.16.0-0.nightly-multi-2024-06-06-113832 False True True 3h36m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ci-op-kj4wsh12-1c975.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.ci-op-kj4wsh12-1c975.qe.gcp.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) authentication 4.16.0-0.nightly-multi-2024-06-06-113832 False False True 3h36m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ci-op-kj4wsh12-1c975.qe.gcp.devcluster.openshift.com/healthz": dial tcp 10.0.32.107:443: i/o timeout (Client.Timeout exceeded while awaiting headers) ingress 4.16.0-0.nightly-multi-2024-06-06-113832 True False False 3h36m
Expected results:
1. "type: Degraded" should be "True" when default ingress controller canary checks failing 2. co/ingress should be updated to Degraded as well.
Additional info:
sometimes co/ingress shows degraded after changing dns, but after a while it shows available
- duplicates
-
OCPBUGS-37491 co/ingress status cannot reflect the real condition
- Verified