-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Native BGP+EVPN support
-
False
-
-
False
-
In Progress
-
RHOSSTRAT-583 - (TechPreview) OVN native support - Enabling BGP-EVPN
-
rhel-net-ovn
-
0% To Do, 52% In Progress, 48% Done
-
ssg_networking
Based on the design document [0] add support for BGP/EVPN plugin that can be used by OCP and OSP (replacing the current implementation). The plugin should be based on frr.
More info should be filled in as we will get more feedback from both OCP and OSP side.
Clarifications from OSP side
Overview
I'll first talk about requirements split between just native BGP and EVPN, and then how I see this fitting into FRR. Just to set some background here, FRR uses Zebra under the covers + netlink to communicate with the kernel routing table. It can program routes into the kernel default routing table or other VRF tables, and can read from them as well. Zebra also is able to use netlink to discover/manage other components required for EVPN like linux bridge, SVI, VXLAN tunnel, neighbor discovery table, etc.
For OCP 4.17, we are targeting a BGP implementation that will cover 2 main use cases:
- Integrating with the provider network's routing via BGP without VPN (shared and local gateway mode).
- Enabling VRF-Lite to allow for a lightweight VPN solution (local gateway mode only).
To read more about this look at:
https://github.com/openshift/enhancements/pull/1636
For OCP 4.18, we are targeting BGP+EVPN implementation, that will cover enabling VPN with (local gateway mode only).
The above restrictions to local gateway mode are due to lack of required support in OVN to make shared gateway mode work.
BGP use case 1
For this use case we need to learn routes from the external network, as well as advertise routes to the external network. FRR learns routes from peers via BGP and then programs them into the kernel routing table using Zebra. For local gateway mode, all traffic flows via the kernel before leaving the node, so the support here is sufficient to route pod egress traffic via these learned routes. For shared gateway mode in 4.17, OVN-Kubernetes will be watching via netlink for these routes and then configuring them via NBDB in OVN. OVN already supports ECMP routes, so those should work as well. In order to make this design less dependent on OpenShift and more OVN native, we would need:
OVN Requirement 1: Ability for OVN to talk to FRR and configure dynamically learned routes. In this case we would need to specify which OVN router the routes should be programmed on (in ovnk case its the gateway router <GR>).
For route advertisement to the provider network, OVNK is responsible for configuring FRR through an KAPI native API called FRR-K8S. Since OVNK manages ip addressing schemes across the cluster, it makes sense for it to configure FRR here. It could be useful in the future for OVN to tell FRR what subnets are locally connected to its router (as well as its statically configured routes) so that in FRR you could do "redistribute connected" or "redistribute static" and FRR would advertise those without CMS intervention, but it is not required for OCP at this time.
Additionally, FRR and BGP supports BFD for determining link health and fast convergence. When BFD link fails, FRR will purge the dynamic routes from the kernel routing table, and in OCP 4.17 OVNK will then remove the routes from OVN. FRR uses a BFD daemon in the kernel today to run BFD over a link. OVN also already supports BFD.
OVN Requirement 2: Ability for OVN to integrate with FRR as a BFD provider. OVN should be able to associate BFD peers with learned routes via BGP. If BFD fails, OVN should immediately purge the routes and notify FRR of the BFD failure.
BGP use case 2
VRF-Lite relies on a user configuring separate L3 links and leveraging user defined networks (UDNs) to carry VPN without the overhead of VPN encapsulation like MPLS/EVPN. To read more about UDN and understand how they split pods into multiple networks, take a look at: https://github.com/openshift/enhancements/blob/master/enhancements/network/user-defined-network-segmentation.md
With VRF-Lite a user configures separate L3 links as belonging to different Linux VRFs. BGP peering is done on each of these VRFs via their dedicated links and traffic from the VRFs is sent via that link to the next hop BGP router, where it can then encapsulate into a VPN and send onward. This is only supported in local gateway mode. In order to support with shared gateway mode, these L3 links would need to be attached to the OVS bridge, and then associated with an OVN network topology. This would all come together as flows in br-ex that connect the OVN patch port to the L3 link. The learning of flows would still happen as in the previous use case, except it would now be split up into mapping a VRF/OVN GR to the corresponding FRR BGP peering session for that VRF.
We suspect this VRF-Lite use case will be a stop gap until OCP provides a full VPN solution. The work required to accomplish shared gateway mode for this use case is not worth the effort at this time.
EVPN
The plan for 4.18 is to support EVPN with local gateway mode only. The reason for only supporting local gateway mode is due to lack of native OVN/OVS support with FRR. Take a look at the EVPN Design Draft document for more information on how this will work:
https://docs.google.com/document/d/1wt3z9EH5LKk02IQK7xlIxBbu-DSnw2jSgbnyYE96VDA/edit
For EVPN, FRR heavily relies on Linux networking components to work: https://docs.frrouting.org/en/latest/evpn.html
It needs to see Linux bridge, SVI, VXLAN, VRF configured to work correctly. We don't really need any of these things with OVS. So we need a module in FRR to work with OVN/OVS. Here are some things we need to address:
OVN Requirement 3: With FRR it looks at the MAC table in the Linux bridge to determine what Type 2 advertisements to send out for a MAC-VRF. We would need FRR to check the mac binding table in OVN I guess in its OVN/OVS driver.
OVN Requirement 4: ARP/ND suppression is a Linux kernel function. I'm not sure how it works under the covers with the SVI attached to the linux bridge, but we need something comparable to work in OVN/OVS.
OVN Requirement 5: We need a way in OVN/OVS to configure a VRF (and its type) to a VNI and ensure OVS will encapsulate the packet with the right information. With User Defined Networks (UDNs) we map an OVN topology to a UDN and thus a VRF ID. We would need to associate a GR with a VRF, and then an VXLAN VNID for EVPN.
OVN Requirement 6: Remote VTEPs are learned dynamically in EVPN by BGP with Type 3 routes. FRR will need to program OVN/OVS with a new driver to configure the remote to send to.
There may be more EVPN requirements to implement in OVN that I just haven't thought of yet.
Why?
The motivation for adding OVN native support would be:
- OVS offload support.
- Removing the burden on the CMS.
- OVN native solution would work for multiple platforms (not just Kubernetes).
I'm not advocating for doing this work. I'm simply stating what I see as the benefits to implementing the requirements.
Design
FRR exposes a programmable API via Zebra (ZAPI). OVN should be able to leverage this API within OVN components to talk to FRR via this API and learn routes or configure FRR. I believe this path should be explored first as a native way for integration between OVN and FRR without having needing to fork FRR/add code changes to it. In the OVN-Kubernetes use case FRR will be co-located on the node with a full OVN stack, so network performance of communication between the components should not be an issue.
[0] https://docs.google.com/document/d/1wt3z9EH5LKk02IQK7xlIxBbu-DSnw2jSgbnyYE96VDA/edit
- depends on
-
FDP-667 Adjust the plugin system to be more versatile
-
- Closed
-