-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Description of problem:
Openshift has a use case on AWS where some nodes of a cluster, but not all, are deployed on AWS local zones. The result is that these nodes reside on a different network than the other nodes on the cluster. While both networks and all nodes on the cluster could be configured with a high MTU value (let's say 9001), in between those networks paths there are segments using a lower MTU (let's say 1300) value, forcing the cluster to be configured and function with that sub-optimum lower MTU value. Ideally intra-communication within those networks and to other external networks could use the higher MTU value.
While PMTUD discovery should work in such an scenario for intra-cluster traffic, there are issues when geneve traffic is involved.
We observed two issues that might be related.
Issue 1
----------
When observing such a cluster configured with the higher MTU value, and inspecting geneve traffic we can see constant ICMP NEEDS FRAG replies as a result of the geneve traffic traversing the lower MTU segment:
sh-4.4# tcpdump -i br-ex -vveenn icmp dropped privs to tcpdump tcpdump: listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes 15:49:33.942921 16:e5:eb:e1:a6:37 > 16:69:d7:61:83:63, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto ICMP (1), length 56) 10.0.192.1 > 10.0.193.203: ICMP 10.0.22.231 unreachable - need to frag (mtu 1300), length 36 (tos 0x0, ttl 64, id 38422, offset 0, flags [DF], proto UDP (17), length 2747) 10.0.193.203.47768 > 10.0.22.231.6081: Geneve [|geneve] 15:49:35.430295 16:e5:eb:e1:a6:37 > 16:69:d7:61:83:63, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto ICMP (1), length 56) 10.0.192.1 > 10.0.193.203: ICMP 10.0.8.2 unreachable - need to frag (mtu 1300), length 36 (tos 0x0, ttl 64, id 54341, offset 0, flags [DF], proto UDP (17), length 2747) 10.0.193.203.39672 > 10.0.8.2.6081: Geneve [|geneve]
But the route exception due to PMTUD discovery does not seem to be happening, which is unexpected:
sh-4.4# ip r get 10.0.8.2 10.0.8.2 via 10.0.192.1 dev br-ex src 10.0.193.203 uid 0 cache
Issue 2
----------
If we trigger a route exception by other means (this example is using a different set of IPs than the previous one):
172.18.0.4 dev breth0 src 172.18.0.3 uid 0 cache expires 495sec mtu 1400
Then we expect the geneve kernel driver to build an ICMP and send it back to the geneve device whis is part of an OVS bridge.
But what we see instead is first a very weird unexpected ICMP in response to the first oversized packet the client sends:
16:14:14.553971 lo In ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 14151, offset 0, flags [none], proto ICMP (1), length 576) 172.18.0.3 > 172.18.0.3: ICMP 172.18.0.4 unreachable - need to frag (mtu 1400), length 556 (tos 0x0, ttl 64, id 48593, offset 0, flags [DF], proto UDP (17), length 1457) 172.18.0.3.46416 > 172.18.0.4.6081: Geneve, Flags [C], vni 0x1, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00060005] 0a:58:0a:f4:02:01 > 0a:58:0a:f4:02:03, ethertype IPv4 (0x0800), length 1413: (tos 0x0, ttl 63, id 40725, offset 0, flags [DF], proto UDP (17), length 1399) 10.244.0.4.47404 > 10.244.2.3.32286: UDP, length 1371
And then the expected ICMP for the subsequent oversized client packets:
16:14:22.477563 genev_sys_6081 P ifindex 4 0a:58:0a:f4:02:03 ethertype IPv4 (0x0800), length 596: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 576) 10.244.2.3 > 10.244.0.4: ICMP 10.244.2.3 unreachable - need to frag (mtu 1342), length 556 (tos 0x0, ttl 63, id 41415, offset 0, flags [DF], proto UDP (17), length 1399) 10.244.0.4.32825 > 10.244.2.3.32286: UDP, length 1371
So recap:
Issue 1: Why is ICMP needs frag in response the geneve underlay packets not causing a MTU route exception?
Issue 2: Why is that unexpected ICMP on localhost interface being generated?
root@ovn-worker2:/# uname -a Linux ovn-worker2 6.0.7-301.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 4 18:35:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
- causes
-
OCPSTRAT-695 Add support to the Installer to specify a custom MTU for the cluster network
- Closed
- is depended on by
-
SDN-3857 Support for Custom machinepool MTU Configuration for ROSA Local Zones
- New
- relates to
-
SPLAT-1004 [aws][local-zones] Spike/Collaborate Path MTU Discovery (SDN team)
- To Do
-
OCPSTRAT-1859 Hive support to AWS Wavelength Zones
- New
-
OCPSTRAT-350 Add support to AWS Local Zones (Phase II)
- Closed
-
OCPSTRAT-736 Add support to AWS Wavelength Zones
- Closed