-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.z
Description of problem:
Enabling routingViaHost: true and ipForwarding: Global on a HyperShift (HCP) KubeVirt guest cluster causes its console and ingress Cluster Operators (COs) to become degraded. This configuration was applied to the HCP guest cluster to allow application pods (running on KubeVirt VMs) to access external, non-routed networks. These external networks are reachable from the KubeVirt VM nodes via additional network interfaces (configured via NADs), but the pods on the OVN-Kubernetes CNI overlay cannot reach them by default. The core symptom is that route health checks from within the cluster (e.g., from the HCP KubeVirt nodes and the management cluster nodes) begin to fail. However, the ingress routes remain accessible from outside the management cluster (e.g., from a user's browser).
Version-Release number of selected component (if applicable):
4.18
How reproducible:
100%
Steps to Reproduce:
1. Configure a management OCP cluster with OVN-Kubernetes. 2. On the management cluster, set routingViaHost: true and ipForwarding: Global in network.operator/cluster. 3. Deploy a HyperShift (HCP) cluster using the KubeVirt platform. 4.Configure the HCP KubeVirt NodePool to attachDefaultNetwork: true and add one or more additionalNetworks (via NADs). 5. Wait for the HCP cluster to be fully provisioned and healthy. Verify oc get co on the guest cluster shows no degraded operators. 6. On the HCP guest cluster, patch the network.operator/cluster resource to set routingViaHost: true and ipForwarding: Global.
Actual results:
The console and ingress Cluster Operators on the HCP guest cluster go into a DEGRADED state. The COs report route health check failures: message: 'RouteHealthAvailable: failed to GET route ([https://console-openshift-console.apps.guestname.basename.domain.com](https://console-openshift-console.apps.guestname.basename.domain.com)): Get "[https://console-openshift-console.apps.guestname.basename.domain.com](https://console-openshift-console.apps.guestname.basename.domain.com)": context deadline exceeded (Client.Timeout exceeded while awaiting headers)' HCP ingress routes become unreachable from the HCP KubeVirt nodes. HCP ingress routes are also unreachable from the management cluster (base OCP) nodes. HCP ingress routes remain accessible from external clients (e.t., a user's browser on their laptop).
Expected results:
Enabling routingViaHost: true and ipForwarding: Global on the HCP guest cluster should not break internal route health checks. The console and ingress Cluster Operators should remain healthy (AVAILABLE=True, DEGRADED=False). Application pods on the HCP KubeVirt nodes should gain the ability to route traffic to the external networks available on the nodes via the additional NADs. Ingress routes should remain reachable from all locations (HCP nodes, management nodes, and external clients).
Additional info:
Management Cluster network.operator Config:
...
...
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig: {}
gatewayConfig:
ipForwarding: Global
ipv4: {}
ipv6: {}
routingViaHost: true
HCP KubeVirt NodePool Config:
platform:
kubevirt:
additionalNetworks:
- name: clusters-guestname/hcp-nad1
- name: clusters-guestname/hcp-nad2
- name: clusters-guestname/hcp-nad3
attachDefaultNetwork: true
HCP Ingress LoadBalancer Service created via MetalLB IP on secondary network:
ingress-apps LoadBalancer x.x.x.x 172.27.188.15 443:31651/TCP,80:30232/TCP 7d
HCP Cluster Operator Status (After Change):
$ oc get co ingress console
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
console 4.18.18 False False True 34m
ingress 4.18.18 True False True 17m
Additional Analysis & Lab Reproduction:
The issue seems to be a routing conflict created when both the management cluster and the HCP guest cluster have routingViaHost: true enabled, especially in a KubeVirt environment with multiple networks. Lab Reproduction Attempt: A similar scenario was reproduced in a lab. Lab setup: Management cluster (with routingViaHost: true / ipForwarding: Global) and an HCP KubeVirt cluster (with no additional NICs). Enabling routingViaHost: true / ipForwarding: Global on the HCP guest cluster also broke ingress. Key Difference: In the lab, HCP ingress routes were unreachable only from the HCP KubeVirt nodes. They were still reachable from the management cluster (base OCP) nodes. In the customer's environment, reachability is broken from both the HCP nodes and the management cluster nodes. This discrepancy may be due to the presence of the additionalNetworks (NADs) on the customer's NodePool, which were not in the lab test.