-
Bug
-
Resolution: Won't Do
-
Critical
-
None
-
4.10
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When a node A sends traffic towards a nodeport service of another node B with a packet size greater than the cluster wide MTU it will receive an ICMP needs frag. When this happens the node A will store a cache MTU route in its host with the lower MTU value.
Now, if an OVN networked pod on node A tries to send a packet larger than cluster MTU - geneve overhead (pod's MTU is set to the cluster MTU), then the packet will be dropped in the host of node A.
Consider the diagram:
- ovn-worker sends a large packet to a nodeport service on ovn-worker2, this triggers ICMP needs frag:
15:58:30.595779 breth0 Out ifindex 6 02:42:ac:12:00:04 ethertype IPv4 (0x0800), length 1494: (tos 0x0, ttl 64, id 12969, offset 0, flags [DF], proto UDP (17), length 1474) 172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446 15:58:30.595942 eth0 Out ifindex 16 02:42:ac:12:00:04 ethertype IPv4 (0x0800), length 1494: (tos 0x0, ttl 64, id 12969, offset 0, flags [DF], proto UDP (17), length 1474) 172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446 15:58:30.597122 eth0 In ifindex 16 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 582: (tos 0x0, ttl 254, id 0, offset 0, flags [DF], proto ICMP (1), length 562) 172.18.0.3 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 542 (tos 0x0, ttl 63, id 12969, offset 0, flags [DF], proto UDP (17), length 1474) 172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446 15:58:30.597281 breth0 In ifindex 6 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 582: (tos 0x0, ttl 254, id 0, offset 0, flags [DF], proto ICMP (1), length 562) 172.18.0.3 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 542 (tos 0x0, ttl 63, id 12969, offset 0, flags [DF], proto UDP (17), length 1474) 172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446
2. Now ovn-worker has this cached route:
root@ovn-worker:/# ip route get 172.18.0.3 172.18.0.3 dev breth0 src 172.18.0.4 uid 0 cache expires 536sec mtu 1400
3. Now pod2 attempts to send a large packet to trozet3:
16:02:05.549070 bdeed608d395fd0 P ifindex 7 0a:58:0a:f4:02:03 ethertype IPv4 (0x0800), length 1419: (tos 0x0, ttl 64, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371
16:02:05.549409 genev_sys_6081 Out ifindex 4 0a:58:0a:f4:01:01 ethertype IPv4 (0x0800), length 1419: (tos 0x0, ttl 63, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371
16:02:05.549430 lo In ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 13963, offset 0, flags [none], proto ICMP (1), length 576)
172.18.0.4 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 556
(tos 0x0, ttl 64, id 60242, offset 0, flags [DF], proto UDP (17), length 1457)
172.18.0.4.30994 > 172.18.0.3.6081: Geneve, Flags [C], vni 0x1, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8]
0a:58:0a:f4:01:01 > 0a:58:0a:f4:01:03, ethertype IPv4 (0x0800), length 1413: (tos 0x0, ttl 63, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371
The packet is dropped inside ovn-worker because the packet with geneve header > the cached route MTU of 1400. Interestingly the ICMP needs frag is generated and send into the loopback interface