-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The D-NAT translation is not working for frames bigger than MTU for EgressIP traffic where EgressIPs are located located on a different node that the associated workload pod.
Version-Release number of selected component (if applicable):
Openshift 4.16.20
How reproducible:
100%
Steps to Reproduce:
1. Create and Egress IP on node X
2. Create pod on node Y
3. Assign the Egress IP to the POD
4. Make the physical network interface on the EgressIP node to aggregate the packets on the way back to the pod.
Actual results:
Every packet bigger than the MTU is dropped on OVS during the DNAT conversion.
Expected results:
No packets are dropped.
Additional info:
Network:
ens14f3np3 (MTU 1500) --,--> baremetal (802.3ad) ---> baremetal.1500
ens15f3np3 (MTU 1500) -´
POD_IP=172.32.18.35 EGRESS_IP=10.9.4.104 EXTERNAL_IP=10.0.172.19
Pod on node Worker04 initiates connection through EgressIP located on node Wroker05 to an external IP. The communication is SNATEed on node Worker05 and goes through VLAN interface -> bond interface -> slave interfaces. The external IP then sends packets back to the EgressIP. As the physical interfaces (bond slaves) have Generic Receive Offload enable the packets are agregated and are sent with bigger size than 1500 MTU to the OVS. The OVS drops these packets.
tshark -r EGRESS_worker05.ocp03-shared.s.dc1.cz.ipa.ifortuna.cz-baremetal.1500-2024-12-05-15-48-33.pcap -Y "ip.addr==10.0.172.19" | head -n 20
129093 2024-12-05 15:49:09.087887 0.000000 172.32.18.35 → 10.0.172.19 TCP 132 39372 → 443 [SYN] Seq=0 Win=32640 Len=0 MSS=1360 SACK_PERM TSval=2407828933 TSecr=0 WS=128
129094 2024-12-05 15:49:09.088893 0.001006 10.9.4.104 → 10.0.172.19 TCP 74 39372 → 443 [SYN] Seq=0 Win=32640 Len=0 MSS=1360 SACK_PERM TSval=2407828933 TSecr=0 WS=128
129115 2024-12-05 15:49:09.094668 0.005775 10.0.172.19 → 10.9.4.104 TCP 74 443 → 39372 [SYN, ACK] Seq=0 Ack=1 Win=31856 Len=0 MSS=1460 SACK_PERM TSval=2733533901 TSecr=2407828933 WS=128
129116 2024-12-05 15:49:09.095594 0.000926 10.0.172.19 → 172.32.18.35 TCP 132 443 → 39372 [SYN, ACK] Seq=0 Ack=1 Win=31856 Len=0 MSS=1460 SACK_PERM TSval=2733533901 TSecr=2407828933 WS=128
129117 2024-12-05 15:49:09.096725 0.001131 172.32.18.35 → 10.0.172.19 TCP 124 39372 → 443 [ACK] Seq=1 Ack=1 Win=32640 Len=0 TSval=2407828942 TSecr=2733533901
129119 2024-12-05 15:49:09.097181 0.000456 10.9.4.104 → 10.0.172.19 TCP 66 39372 → 443 [ACK] Seq=1 Ack=1 Win=32640 Len=0 TSval=2407828942 TSecr=2733533901
129295 2024-12-05 15:49:09.113060 0.015879 172.32.18.35 → 10.0.172.19 TCP 380 39372 → 443 [PSH, ACK] Seq=1 Ack=1 Win=32640 Len=256 TSval=2407828959 TSecr=2733533901
129296 2024-12-05 15:49:09.113116 0.000056 10.9.4.104 → 10.0.172.19 TCP 322 39372 → 443 [PSH, ACK] Seq=1 Ack=1 Win=32640 Len=256 TSval=2407828959 TSecr=2733533901
129533 2024-12-05 15:49:09.118773 0.005657 10.0.172.19 → 10.9.4.104 TCP 66 443 → 39372 [ACK] Seq=1 Ack=257 Win=31872 Len=0 TSval=2733533926 TSecr=2407828959
129534 2024-12-05 15:49:09.118813 0.000040 10.0.172.19 → 172.32.18.35 TCP 124 443 → 39372 [ACK] Seq=1 Ack=257 Win=31872 Len=0 TSval=2733533926 TSecr=2407828959
[1] 129535 2024-12-05 15:49:09.119398 0.000585 10.0.172.19 → 10.9.4.104 TCP 2762 443 → 39372 [PSH, ACK] Seq=1 Ack=257 Win=31872 Len=2696 TSval=2733533926 TSecr=2407828959
[2] 2024-12-05 15:49:09.119398 0.000000 10.0.172.19 → 10.9.4.104 TCP 1466 443 → 39372 [PSH, ACK] Seq=2697 Ack=257 Win=31872 Len=1400 TSval=2733533926 TSecr=2407828959
129549 2024-12-05 15:49:09.121007 0.001609 10.0.172.19 → 10.9.4.104 TCP 893 443 → 39372 [PSH, ACK] Seq=4097 Ack=257 Win=31872 Len=827 TSval=2733533927 TSecr=2407828959
[3] 129550 2024-12-05 15:49:09.121045 0.000038 10.0.172.19 → 172.32.18.35 TCP 951 [TCP Previous segment not captured] 443 → 39372 [PSH, ACK] Seq=4097 Ack=257 Win=31872 Len=827 TSval=2733533927 TSecr=2407828959
129575 2024-12-05 15:49:09.121361 0.000316 172.32.18.35 → 10.0.172.19 TCP 136 TCP Dup ACK 129117#1 39372 → 443 [ACK] Seq=257 Ack=1 Win=32640 Len=0 TSval=2407828967 TSecr=2733533926 SLE=4097 SRE=4924
129577 2024-12-05 15:49:09.121404 0.000043 10.9.4.104 → 10.0.172.19 TCP 78 TCP Dup ACK 129119#1 39372 → 443 [ACK] Seq=257 Ack=1 Win=32640 Len=0 TSval=2407828967 TSecr=2733533926 SLE=4097 SRE=4924
129794 2024-12-05 15:49:09.134198 0.012794 10.0.172.19 → 10.9.4.104 TCP 1414 [TCP Retransmission] 443 → 39372 [ACK] Seq=1 Ack=257 Win=31872 Len=1348 TSval=2733533941 TSecr=2407828967
129795 2024-12-05 15:49:09.134230 0.000032 10.0.172.19 → 172.32.18.35 TCP 1472 [TCP Retransmission] 443 → 39372 [ACK] Seq=1 Ack=257 Win=31872 Len=1348 TSval=2733533941 TSecr=2407828967
129796 2024-12-05 15:49:09.134520 0.000290 172.32.18.35 → 10.0.172.19 TCP 136 39372 → 443 [ACK] Seq=257 Ack=1349 Win=31360 Len=0 TSval=2407828980 TSecr=2733533941 SLE=4097 SRE=4924
Packets [1] [2] frame number 129535 129536 are not translated and sent to the Geneve tunnel due to their size.
Only packets lover than MTU size as [3] frame 129550 is translated and sent to the tunnel.
Turning off the GRO makes it work as the packets are not aggregates and fit to the MTU.