Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Networking / ovn-kubernetes
Labels:
- cee.neXT
- cnv-netsdn-wg

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:

4.18.z, 4.19.z, 4.20.z, 4.21.z
Target Version:

4.22.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Enabling routingViaHost: true and ipForwarding: Global on a HyperShift (HCP) KubeVirt guest cluster causes its console and ingress Cluster Operators (COs) to become degraded.

This configuration was applied to the HCP guest cluster to allow application pods (running on KubeVirt VMs) to access external, non-routed networks. These external networks are reachable from the KubeVirt VM nodes via additional network interfaces (configured via NADs), but the pods on the OVN-Kubernetes CNI overlay cannot reach them by default.

The core symptom is that route health checks from within the cluster (e.g., from the HCP KubeVirt nodes and the management cluster nodes) begin to fail. However, the ingress routes remain accessible from outside the management cluster (e.g., from a user's browser).

Version-Release number of selected component (if applicable):

    4.18

How reproducible:

    100%

Steps to Reproduce:

1. Configure a management OCP cluster with OVN-Kubernetes.

2. On the management cluster, set routingViaHost: true and ipForwarding: Global in network.operator/cluster.

3. Deploy a HyperShift (HCP) cluster using the KubeVirt platform.
4.Configure the HCP KubeVirt NodePool to attachDefaultNetwork: true and add one or more additionalNetworks (via NADs).

5. Wait for the HCP cluster to be fully provisioned and healthy. Verify oc get co on the guest cluster shows no degraded operators.

6. On the HCP guest cluster, patch the network.operator/cluster resource to set routingViaHost: true and ipForwarding: Global.

Actual results:

    The console and ingress Cluster Operators on the HCP guest cluster go into a DEGRADED state.

The COs report route health check failures:

message: 'RouteHealthAvailable: failed to GET route ([https://console-openshift-console.apps.guestname.basename.domain.com](https://console-openshift-console.apps.guestname.basename.domain.com)):
  Get "[https://console-openshift-console.apps.guestname.basename.domain.com](https://console-openshift-console.apps.guestname.basename.domain.com)": context deadline
  exceeded (Client.Timeout exceeded while awaiting headers)'

HCP ingress routes become unreachable from the HCP KubeVirt nodes. 

HCP ingress routes are also unreachable from the management cluster (base OCP) nodes.

HCP ingress routes remain accessible from external clients (e.t., a user's browser on their laptop).

Expected results:

    Enabling routingViaHost: true and ipForwarding: Global on the HCP guest cluster should not break internal route health checks.

The console and ingress Cluster Operators should remain healthy (AVAILABLE=True, DEGRADED=False).

Application pods on the HCP KubeVirt nodes should gain the ability to route traffic to the external networks available on the nodes via the additional NADs.

Ingress routes should remain reachable from all locations (HCP nodes, management nodes, and external clients).

Additional info:

Management Cluster network.operator Config:

...
...
  defaultNetwork:
    ovnKubernetesConfig:
      egressIPConfig: {}
      gatewayConfig:
        ipForwarding: Global
        ipv4: {}
        ipv6: {}
        routingViaHost: true


HCP KubeVirt NodePool Config:

    platform:
      kubevirt:
        additionalNetworks:
        - name: clusters-guestname/hcp-nad1
        - name: clusters-guestname/hcp-nad2
        - name: clusters-guestname/hcp-nad3
        attachDefaultNetwork: true


HCP Ingress LoadBalancer Service created via MetalLB IP on secondary network:

ingress-apps                       LoadBalancer   x.x.x.x   172.27.188.15   443:31651/TCP,80:30232/TCP   7d


HCP Cluster Operator Status (After Change):

$ oc get co ingress console
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
console   4.18.18   False       False         True       34m
ingress   4.18.18   True        False         True       17m

Additional Analysis & Lab Reproduction:

The issue seems to be a routing conflict created when both the management cluster and the HCP guest cluster have routingViaHost: true enabled, especially in a KubeVirt environment with multiple networks.

Lab Reproduction Attempt:

A similar scenario was reproduced in a lab.

Lab setup: Management cluster (with routingViaHost: true / ipForwarding: Global) and an HCP KubeVirt cluster (with no additional NICs).

Enabling routingViaHost: true / ipForwarding: Global on the HCP guest cluster also broke ingress.

Key Difference: In the lab, HCP ingress routes were unreachable only from the HCP KubeVirt nodes. They were still reachable from the management cluster (base OCP) nodes.

In the customer's environment, reachability is broken from both the HCP nodes and the management cluster nodes. This discrepancy may be due to the presence of the additionalNetworks (NADs) on the customer's NodePool, which were not in the lab test.

links to

downstream rebase

Upstream ovn-kubernetes fix

Assignee:: Felix Enrique Llorente Pastora

Reporter:: Divyam Pateriya

Need Info From:: None

Contributors:: None

QA Contact:: Ying Zhou

Doc Contact:: None

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2025/11/17 8:49 PM

Updated:: 2026/01/08 8:26 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates