-
Story
-
Resolution: Done
-
Normal
-
None
SME: Linoy Hadad
QE: David Asulin
dan5179 mentions that:
In both OCP 3 and 4, MTU issues present in very bizarre behavior.
In OCP 4, the 2 SSL warning pages for the console would load and then just a white screen of doom forever
In the latest case, he was using ABI with the interface MTUs set to 9000 using the NMState interface config, but the network switches were not configured for jumbo frames. The result was that none of the control plane nodes ever rebooted. According to Dan 'generally it's SSL frames like OAUTH that have "no fragment" marked' and thus get lost.
External configuration issues that "present in very bizarre behaviour" are ideal candidates for assisted installer validations. We already verify connectivity between nodes using ping. We could verify the expected MTU by additionally doing (the equivalent of):
mtu="$(ip -j link show "${ifname}" | jq '.[0].mtu')" ping "${ip_addr}" -c 3 -M do -s $((mtu - 28)) -I "${ifname}" ping6 "${ip6_addr}%${ifname}" -c 3 -M do -s $((mtu - 48)) -I "${ifname}"
i.e. send a maximum-sized packet with No Fragment set and see if we get a response. This will be sufficient to validate connectivity even in cases where ICMP "Fragmentation Needed" packets are dropped (and therefore Path-Based MTU Discovery will not work).
Probably we should limit this to interfaces where the MTU is set explicitly? Rather than interrogate the NMState data, perhaps verifying all interfaces where the MTU is not 1500 would be the simplest approach.
- mentioned on