Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7609

Large packets towards nodeport may trigger geneve packet drops between nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • 4.10
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When a node A sends traffic towards a nodeport service of another node B with a packet size greater than the cluster wide MTU it will receive an ICMP needs frag. When this happens the node A will store a cache MTU route in its host with the lower MTU value.

       

      Now, if an OVN networked pod on node A tries to send a packet larger than cluster MTU - geneve overhead (pod's MTU is set to the cluster MTU), then the packet will be dropped in the host of node A.

      Consider the diagram:

       

      1. ovn-worker sends a large packet to a nodeport service on ovn-worker2, this triggers ICMP needs frag:
      15:58:30.595779 breth0 Out ifindex 6 02:42:ac:12:00:04 ethertype IPv4 (0x0800), length 1494: (tos 0x0, ttl 64, id 12969, offset 0, flags [DF], proto UDP (17), length 1474)
          172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446
      15:58:30.595942 eth0  Out ifindex 16 02:42:ac:12:00:04 ethertype IPv4 (0x0800), length 1494: (tos 0x0, ttl 64, id 12969, offset 0, flags [DF], proto UDP (17), length 1474)
          172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446
      15:58:30.597122 eth0  In  ifindex 16 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 582: (tos 0x0, ttl 254, id 0, offset 0, flags [DF], proto ICMP (1), length 562)
          172.18.0.3 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 542
      	(tos 0x0, ttl 63, id 12969, offset 0, flags [DF], proto UDP (17), length 1474)
          172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446
      15:58:30.597281 breth0 In  ifindex 6 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 582: (tos 0x0, ttl 254, id 0, offset 0, flags [DF], proto ICMP (1), length 562)
          172.18.0.3 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 542
      	(tos 0x0, ttl 63, id 12969, offset 0, flags [DF], proto UDP (17), length 1474)
          172.18.0.4.47306 > 172.18.0.3.31381: UDP, length 1446 

      2. Now ovn-worker has this cached route:

      root@ovn-worker:/# ip route get 172.18.0.3
      172.18.0.3 dev breth0 src 172.18.0.4 uid 0 
          cache expires 536sec mtu 1400 

      3. Now pod2 attempts to send a large packet to trozet3:

      16:02:05.549070 bdeed608d395fd0 P   ifindex 7 0a:58:0a:f4:02:03 ethertype IPv4 (0x0800), length 1419: (tos 0x0, ttl 64, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
          10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371
      16:02:05.549409 genev_sys_6081 Out ifindex 4 0a:58:0a:f4:01:01 ethertype IPv4 (0x0800), length 1419: (tos 0x0, ttl 63, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
          10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371
      16:02:05.549430 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 13963, offset 0, flags [none], proto ICMP (1), length 576)
          172.18.0.4 > 172.18.0.4: ICMP 172.18.0.3 unreachable - need to frag (mtu 1400), length 556
      	(tos 0x0, ttl 64, id 60242, offset 0, flags [DF], proto UDP (17), length 1457)
          172.18.0.4.30994 > 172.18.0.3.6081: Geneve, Flags [C], vni 0x1, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8]
      	0a:58:0a:f4:01:01 > 0a:58:0a:f4:01:03, ethertype IPv4 (0x0800), length 1413: (tos 0x0, ttl 63, id 45192, offset 0, flags [DF], proto UDP (17), length 1399)
          10.244.2.3.45688 > 10.244.1.3.1337: UDP, length 1371 

      The packet is dropped inside ovn-worker because the packet with geneve header > the cached route MTU of 1400. Interestingly the ICMP needs frag is generated and send into the loopback interface  

              trozet@redhat.com Tim Rozet
              trozet@redhat.com Tim Rozet
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: