Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7944

[RFE] MetalLB & OVN: Support for DNAT of ICMP Type 3 (Destination Unreachable) and Code 4 (Fragmentation Needed) to LoadBalancer IPs

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.16, 4.18, 4.17
    • Network - Core
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Proposed title of this feature request:

      OCP Pods with a higher MTU (8900) communicate with a destination network that also supports the same MTU. The initial TCP 3-way handshake succeeds because the packets exchanged during the handshake are small. However, once actual data transfer begins and packets use the larger MTU, communication breaks if any intermediate hop in the path doesn't support that higher MTU.

      This failure occurs because the ICMP Type 3 (Destination Unreachable), Code 4 (Fragmentation Needed) messages sent by the intermediate router are not delivered to the Pod. These ICMP messages are addressed to the LoadBalancer IP, but due to DNAT rules (which typically only apply to TCP/UDP traffic), the ICMP packets aren't forwarded to the Pod.

      If the Pod were able to receive the ICMP Type 3/Code 4 message, it could adjust by fragmenting the payload or reducing the size of the packets it sends — thus avoiding the issue where intermediate hops can't handle larger MTU sizes.

      Example scenario:

      • Two OpenShift clusters (C1 and C2) both have an MTU of 8900.
      • There are five network hops between them.
      • A Pod in cluster C1 communicates with a Pod in cluster C2 via an egress IP, targeting a LoadBalancer IP exposed by MetalLB.
      • The initial TCP 3-way handshake succeeds with an agreed MSS of 8860.
      • C1 sends a Client Hello (560 bytes), which successfully reaches C2.
      • C2 responds with a Server Hello of 1920 bytes.
      • An intermediate router — which only supports 1500 MTU — sends back an ICMP Type 3/Code 4 message indicating that fragmentation is needed.
      • However, this ICMP packet is sent to the LoadBalancer IP, and since no DNAT rule exists for ICMP traffic in the NFT prerouting chain or the OVN NAT handling the service IP, the packet is not forwarded to the destination Pod.

      As a result, the Pod behind the LoadBalancer keeps retransmitting the 1920-byte Server Hello. It never receives the ICMP error, so it doesn’t adjust the packet size. Eventually, the client closes the connection due to a lack of response.

      List any affected packages or components:

      • MetalLB, OVN

              mcurry@redhat.com Marc Curry
              rhn-support-rsahoo Ramesh Sahoo
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                None
                None