Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-164

Unexpected ICMP needs frag behavior in geneve tunnels

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • openvswitch3.2
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Description of problem:

      Openshift has a use case on AWS where some nodes of a cluster, but not all, are deployed on AWS local zones. The result is that these nodes reside on a different network than the other nodes on the cluster. While both networks and all nodes on the cluster could be configured with a high MTU value (let's say 9001), in between those networks paths there are segments using a lower MTU (let's say 1300) value, forcing the cluster to be configured and function with that sub-optimum lower MTU value. Ideally intra-communication within those networks and to other external networks could use the higher MTU value.

      While PMTUD discovery should work in such an scenario for intra-cluster traffic, there are issues when geneve traffic is involved.

      We observed two issues that might be related.

      Issue 1
      ----------

      When observing such a cluster configured with the higher MTU value, and inspecting geneve traffic we can see constant ICMP NEEDS FRAG replies as a result of the geneve traffic traversing the lower MTU segment:

      sh-4.4# tcpdump -i br-ex -vveenn icmp
      dropped privs to tcpdump
      tcpdump: listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes
      15:49:33.942921 16:e5:eb:e1:a6:37 > 16:69:d7:61:83:63, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto ICMP (1), length 56)
          10.0.192.1 > 10.0.193.203: ICMP 10.0.22.231 unreachable - need to frag (mtu 1300), length 36
      	(tos 0x0, ttl 64, id 38422, offset 0, flags [DF], proto UDP (17), length 2747)
          10.0.193.203.47768 > 10.0.22.231.6081: Geneve [|geneve]
      15:49:35.430295 16:e5:eb:e1:a6:37 > 16:69:d7:61:83:63, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto ICMP (1), length 56)
          10.0.192.1 > 10.0.193.203: ICMP 10.0.8.2 unreachable - need to frag (mtu 1300), length 36
      	(tos 0x0, ttl 64, id 54341, offset 0, flags [DF], proto UDP (17), length 2747)
          10.0.193.203.39672 > 10.0.8.2.6081: Geneve [|geneve]
      

      But the route exception due to PMTUD discovery does not seem to be happening, which is unexpected:

      sh-4.4# ip r get 10.0.8.2
      10.0.8.2 via 10.0.192.1 dev br-ex src 10.0.193.203 uid 0 
          cache
      

      Issue 2
      ----------

      If we trigger a route exception by other means (this example is using a different set of IPs than the previous one):

      172.18.0.4 dev breth0 src 172.18.0.3 uid 0 
          cache expires 495sec mtu 1400
      

      Then we expect the geneve kernel driver to build an ICMP and send it back to the geneve device whis is part of an OVS bridge.

      But what we see instead is first a very weird unexpected ICMP in response to the first oversized packet the client sends:

      16:14:14.553971 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 14151, offset 0, flags [none], proto ICMP (1), length 576)
          172.18.0.3 > 172.18.0.3: ICMP 172.18.0.4 unreachable - need to frag (mtu 1400), length 556
      	(tos 0x0, ttl 64, id 48593, offset 0, flags [DF], proto UDP (17), length 1457)
          172.18.0.3.46416 > 172.18.0.4.6081: Geneve, Flags [C], vni 0x1, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00060005]
      	0a:58:0a:f4:02:01 > 0a:58:0a:f4:02:03, ethertype IPv4 (0x0800), length 1413: (tos 0x0, ttl 63, id 40725, offset 0, flags [DF], proto UDP (17), length 1399)
          10.244.0.4.47404 > 10.244.2.3.32286: UDP, length 1371
      

      And then the expected ICMP for the subsequent oversized client packets:

      16:14:22.477563 genev_sys_6081 P   ifindex 4 0a:58:0a:f4:02:03 ethertype IPv4 (0x0800), length 596: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 576)
          10.244.2.3 > 10.244.0.4: ICMP 10.244.2.3 unreachable - need to frag (mtu 1342), length 556
      	(tos 0x0, ttl 63, id 41415, offset 0, flags [DF], proto UDP (17), length 1399)
          10.244.0.4.32825 > 10.244.2.3.32286: UDP, length 1371
      

      So recap:
      Issue 1: Why is ICMP needs frag in response the geneve underlay packets not causing a MTU route exception?
      Issue 2: Why is that unexpected ICMP on localhost interface being generated?

      root@ovn-worker2:/# uname -a
      Linux ovn-worker2 6.0.7-301.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 4 18:35:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      

        1. icmp2.pcap
          22 kB
        2. vxlan.pcap
          2 kB
        3. vxlan2.pcap
          2 kB

            aconole@redhat.com Aaron Conole
            jcaamano@redhat.com Jaime Caamaño Ruiz
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: