-
Bug
-
Resolution: Won't Do
-
Minor
-
None
-
4.14
-
Important
-
No
-
Rejected
-
False
-
-
Known Issue
Description of problem:
After enabling and disabling ipsec during runtime., the cluster is not in the health state any more with Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-09-15-233408
How reproducible:
Most times
Steps to Reproduce:
1.Install a GCP cluster without ipsec 2.Enable ipsec in the cluster oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}' 3.Create some testing pods 4.Disable ipsec in the cluster oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":null}}}}'
Actual results:
# From 4.14 [weliang@weliang Test]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-09-15-233408 True False 5h8m Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors [weliang@weliang Test]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-09-15-233408 True False 5h8m Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors [weliang@weliang Test]$ oc get co --no-headers | grep -v '.True.*False.*False' authentication 4.14.0-0.nightly-2023-09-15-233408 True False True 12s OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: Get "https://10.128.0.59:6443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) console 4.14.0-0.nightly-2023-09-15-233408 False False False 20s RouteHealthAvailable: console route is not admitted monitoring 4.14.0-0.nightly-2023-09-15-233408 False True True 2m44s reconciling Thanos Querier Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io thanos-querier), deleting UserWorkload federate Route failed: the server is currently unable to handle the request (delete routes.route.openshift.io federate), reconciling Prometheus Federate Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io prometheus-k8s-federate) [weliang@weliang Test]$ ## From 4.13 [weliang@weliang verification-tests]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.13 True False 5h11m Error while reconciling 4.13.13: an unknown error has occurred: MultipleErrors [weliang@weliang verification-tests]$ oc get co --no-headers | grep -v '.True.*False.*False' authentication 4.13.13 True False True 29s OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: Get "https://10.128.0.40:6443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) monitoring 4.13.13 False True True 25s reconciling Alertmanager Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io alertmanager-main), deleting UserWorkload federate Route failed: the server is currently unable to handle the request (delete routes.route.openshift.io federate), reconciling Prometheus Federate Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io prometheus-k8s-federate) [weliang@weliang verification-tests]$
Expected results:
Cluster should be in a heath state
Additional info:
The issue happened in both 4.14 and 4.13 must-gather https://people.redhat.com/~weliang/must-gather-4.14.tar.gz https://people.redhat.com/~weliang/must-gather-4.13.tar.gz