Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2883

Redesign OVN-K EgressIP Routing Semantics for Local and Shared Gateway Modes with RouteViaHost Support

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • 100% To Do, 0% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • XL
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview (aka. Goal Summary)  

      The current OVN-Kubernetes EgressIP implementation does not correctly honor host-level static routes configured on the primary interface when the RouteViaHost feature is enabled. Instead, routing decisions for EgressIP traffic rely exclusively on the default route programmed in the OVN-Kubernetes Gateway Router (GR).

      This behavior diverges from the legacy (now deprecated) openshift-sdn implementation, where host static routes were respected. As a result, multiple customers who depend on host routing policies are unable to migrate from openshift-sdn to OVN-Kubernetes, blocking planned OpenShift upgrades.

      Additionally, there are architectural inconsistencies in how Primary and Secondary EgressIPs behave across Local Gateway Mode (LGM) and Shared Gateway Mode (SGM):

      • Primary EgressIPs always follow shared-gateway logic (traffic exits via GR/br-ex), even in LGM.
      • Secondary EgressIPs always follow local-gateway logic (traffic exits via MPX into the host), even in SGM.

      Both behaviors are side effects of historical implementation choices rather than intentional design and lead to:

      • Incorrect routing behavior
      • Loss of hardware offload opportunities
      • Operational confusion
      • Inconsistent semantics between gateway modes

      This feature proposes a redesign of EgressIP behavior in both LGM and SGM to create consistent, predictable, and host-aware routing semantics that properly integrate with RouteViaHost and host routing tables.

      Goals (aka. expected user outcomes)

      • Enable OVN-Kubernetes EgressIP to honor host static routes on the primary interface when RouteViaHost is enabled.
      • Align EgressIP behavior with customer expectations and historical openshift-sdn behavior.
      • Redesign Primary and Secondary EgressIP handling to behave consistently and correctly in both LGM and SGM.
      • Ensure traffic exits the node through the correct data path based on gateway mode and network type.
      • Restore and maximize hardware offload potential in Shared Gateway Mode.
      • Provide a supported, sustainable solution without requiring unsupported manual modifications to the Gateway Router.
      • Unblock customer migration from openshift-sdn to OVN-Kubernetes.

      Requirements (aka. Acceptance Criteria):

      Functional Requirements

      • When RouteViaHost=true, EgressIP traffic must consult and honor host routing tables (including static routes) on the primary interface.
      • In Local Gateway Mode:
        • Egress traffic should exit via the host networking stack when appropriate.
      • In Shared Gateway Mode:
        • Egress traffic should preferentially use the shared gateway path (GR/br-ex) to preserve offload capability.
      • Primary and Secondary EgressIPs must no longer have hardcoded, inverted gateway behavior.
      • Routing decision logic must be mode-aware and interface-aware.
      • The design must avoid requiring users to manually modify OVN Gateway Router routes.

      Non-Functional Requirements

      • No regression to existing working EgressIP scenarios.
      • Backward compatibility for clusters not using RouteViaHost.
      • Clear, deterministic routing behavior that can be reasoned about by operators.
      • Maintain performance characteristics and offload capabilities where applicable.
         
        Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
        Deployment considerations List applicable specific needs (N/A = not applicable)
        Self-managed, managed, or both  
        Classic (standalone cluster)  
        Hosted control planes  
        Multi node, Compact (three node), or Single node (SNO), or all  
        Connected / Restricted Network  
        Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
        Operator compatibility  
        Backport needed (list applicable versions)  
        UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
        Other (please specify)  

      Use Cases (Optional):

      • Customer with static routes on primary NIC
        • Host has static route for specific CIDRs.
        • EgressIP traffic must follow this route instead of the default gateway.
      • Migration from openshift-sdn
        • Existing deployments rely on host route behavior.
        • Migration currently blocked due to OVN behavior mismatch.
      • Secondary networks with dedicated NICs
        • Customers deploy secondary bridges for traffic segregation and offload.
        • EgressIP traffic should use shared gateway semantics to allow offload.
      • Mixed gateway mode environments
        • Consistent operator experience regardless of LGM or SGM.
      • RouteViaHost enabled clusters
        • Expectation that “via host” semantics apply to EgressIP as well.

      Questions to Answer (Optional):

      • Where should routing decision logic reside when RouteViaHost is enabled — OVN, host, or hybrid?
      • How can OVN learn and track host static routes safely and dynamically?
      • Should EgressIP behavior be fully mode-dependent (LGM vs SGM) or interface-dependent?
      • How do we avoid breaking existing deployments relying on current behavior?
      • What is the correct architectural model for Primary vs Secondary EgressIP behavior?
      • How do we ensure hardware offload remains possible in SGM after redesign?
      • What observability/debug tooling is needed for operators to understand routing decisions?

      Out of Scope

      • Redesign of the entire RouteViaHost feature outside of EgressIP context.
      • Changes to non-EgressIP pod routing behavior.
      • Support for arbitrary manipulation of OVN Gateway Router by users.
      • Addressing unrelated EgressIP limitations (scalability, scheduling, etc.).
      • Changes to networking behavior in non-OVN CNIs.

      Background

      Currently:

      • EgressIP routing is dictated by the OVN Gateway Router’s default route.
      • RouteViaHost does not influence EgressIP decisions.
      • Manual addition of static routes to the Gateway Router can “fix” behavior but is unsupported.
      • Primary EgressIPs incorrectly always use shared-gateway path.
      • Secondary EgressIPs incorrectly always use local-gateway path.
      • openshift-sdn honored host static routes, and customers depend on this.

      This mismatch is now a primary blocker for customers upgrading to OVN-Kubernetes.

      Customer Considerations

      • Multiple large customers have explicitly requested this capability.
      • Cluster upgrades are blocked until this is resolved.
      • Customers expect parity with openshift-sdn behavior.
      • Customers using secondary NICs for traffic segregation expect offload to work.
      • Solution must be transparent and not require manual tuning of OVN components.

      Documentation Considerations

      Documentation must clearly describe:

      • How EgressIP routing decisions are made in LGM vs SGM.
      • The effect of RouteViaHost on EgressIP.
      • Expected interaction with host routing tables.
      • Migration notes for customers coming from openshift-sdn.
      • Operational guidance and troubleshooting steps.
      • Examples illustrating host static route behavior.

      Interoperability Considerations

      • Must interoperate correctly with:
        • RouteViaHost
        • Secondary networks and bridges
        • Hardware offload in Shared Gateway Mode
        • Existing EgressIP CRDs and workflows
      • Must not introduce conflicts with host networking, NetworkManager, or static route management.
      • Should remain compatible with future gateway mode enhancements and offload improvements.

              mcurry@redhat.com Marc Curry
              mcurry@redhat.com Marc Curry
              None
              None
              Surya Seetharaman Surya Seetharaman
              Arti Sood Arti Sood
              Ashley Hardin Ashley Hardin
              Chris Fields Chris Fields
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: