-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Allow limiting Geneve and VXLAN source port to ranges.
2. What is the nature and description of the request?
VXLAN and Geneve derive their source port from the skb hash[1]. When the network bumps and net.core.txrehash is enabled, re-transmitted TCP packets will have their hash changed which in turns leads to the tunnel source ports being updated. This can cause a large amount of new flows in a short time.
A way to limit the number of tunnel flows while still benefiting from RSS and txrehash/ECMP is to limit the source port ranges of VXLAN and Geneve tunnels. This is supported in the upstream kernel when creating tunnels manually.
The proposal would be to extend this support to OVN/OCP/OVS so that Geneve and VXLAN tunnels in OCP clusters only derive their source port from a configured range.
3. Why does the customer need this? (List the business requirements here)
In some environments, at least in Azure[2], the number of flows has a limit. When this is reached some flows are dropped. Enabling a range limit for tunnel source ports properly prevent from reaching that limit after a network bump because of re-transmissions.
The issue has be seen in e.g. RHEL-90228 and was reported in the upstream kernel as the reason for implementing source port range for Geneve tunnels[3].
4. List any affected packages or components.
I'm not 100% sure exactly where this would be implemented to enable and configure the feature (OVN? OCP? Both?) but I know at least the following:
- Upstream kernel has support for selecting a source range ports for Geneve[3] and VXLAN tunnels.
- Support for this in VXLAN is already in RHEL 9 and 10; the Geneve one needs to be backported.
- My understanding is configuring the source port range for those tunnels is not currently supported in OVS. The changes there seem reasonable, both the kernel and the userspace part would need to be updated.
I'm only opening a global RFE for now, as the decision on whether to go that path or not needs to be done at the OVN/OCP level. If agreed specific RFEs and requests can be opened on the OVS & kernel sides.
It's worth noting an attempt at globally disabling txreash was posted[4], as a workaround.
[1] https://elixir.bootlin.com/linux/v6.16.7/source/include/net/udp.h#L334
[4] https://github.com/ovn-kubernetes/ovn-kubernetes/pull/524