Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: OVN Kubernetes
Labels:
None

Work Type:
Proactive Architecture
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
4.18 Tech Debt
[QE] How to address?:
---
Intelligence Requested:
Market:

Sprint:
SDN Sprint 251, SDN Sprint 252, SDN Sprint 253
Cost of Delay:
0
WSJF:
0

Release Blocker:
Rejected

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

We have various MTU issues with ovnkube in shared gateway mode due to the pod networking living on an MTU boundary 100 bytes less than the physical network. For example, when an ovn networked pod contacts an external entity, and it replies with a packet larger than the pod MTU, it results in needs frag sent back to the host. Additionally, OVS does not support IP fragmentation, so even if the "don't fragment" bit is 0, OVN will always send ICMP needs frag.

We have various workarounds in place that mitigate the issue, like putting 1400 MTU on routes in the hosts towards service network, and recommending customers use local gateway mode which level the kernel to do fragmentation. However, we need to address this in a more holistic approach.

Our current line of thinking is that we can make the pod MTUs the same as the physical network, thus eliminating a difference in MTU boundary between pods and the physical network. This means the pod egress and egress reply traffic can operate at the higher MTU, which will also improve throughput.

The exception includes packets routed over geneve. For this path, a pod sending a packet that is too large to another packet would result in ICMP needs frag generated by the geneve kernel module. We need support from OVN to route these back to the pod:

https://bugzilla.redhat.com/show_bug.cgi?id=2241711

As additional prevention from the pods sending too large of packets in the first place, we can set MTU routes inside each pod towards the pod subnet, as well as to the service subnet.

After these changes, the only path that can still result in MTU lowering would be ingress traffic that hits a service and is proxied to a pod on another node (like nodeport service). In this case, the ICMP needs frag is unavoidable.

We will need to consider how upgrade will work here.

Assignee:: Riccardo Ravaioli

Reporter:: Tim Rozet

QA Contact:: Huiran Wang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/12/04 8:46 PM

Updated:: 2024/08/19 1:07 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates