Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-439

Incorrect DNAT sequence from OVN-K gateway router nodeport load balancer

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      While working on a fragmentation test case I ran into an apparent OVN issue.

      Disclaimer: different excerpts have different details (IPs, ports) because they correspond to different test runs but can be extrapolated to the same issue

      Reaching out to a node port from outside the cluster ended up with an incorrect NAT sequence:

      udp      17 27 src=172.18.0.5 dst=172.18.0.4 sport=44809 dport=31791 [UNREPLIED] src=10.244.1.11 dst=172.18.0.5 sport=90 dport=44809 mark=10 secctx=system_u:object_r:unlabeled_t:s0 zone=2 use=1
      udp      17 27 src=172.18.0.5 dst=10.244.1.11 sport=44809 dport=90 src=10.244.1.11 dst=172.18.0.5 sport=90 dport=29698 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=2 use=1
      udp      17 27 src=172.18.0.5 dst=10.244.1.11 sport=29698 dport=90 src=10.244.1.11 dst=100.64.0.3 sport=90 dport=29698 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
      udp      17 27 src=100.64.0.3 dst=10.244.1.11 sport=29698 dport=90 src=10.244.1.11 dst=100.64.0.3 sport=90 dport=29698 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=19 use=1
      udp      17 27 src=10.244.1.11 dst=172.18.0.5 sport=90 dport=44809 [UNREPLIED] src=172.18.0.5 dst=10.244.1.11 sport=44809 dport=90 mark=1 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1
      

      The request is committed twice to conntrack zone 2 (the GR zone) and as a result the source port is replaced without any existing conflict on that source port. The reply however, does not go through conntrack zone 2 twice as the request so it ends up not being unDNATed and the packet egresses with an incorrect source IP.

      This just happens the first time the test case is executed, successive times succeed, and is only after a while that the problem happens again.

      Observing ovn-controller logs, there is a packet in for the request, which triggers an ARP sequence to learn a MAC address, before the original request is re-injected with a packet out:

      2024-03-01T13:11:59.022Z|01342|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: received: NXT_PACKET_IN2 (OF1.5) (xid=0x0): table_id=29 cookie=0x6c50f9f6 total_len=1268 ct_state=new|trk|snat|dnat,ct_zone=2,ct_mark=0xa,ct_nw_src=172.18.0.5,ct_nw_dst=172.18.0.2,ct_nw_proto=17,ct_tp_src=48588,ct_tp_dst=30189,ip,reg0=0x64400001,reg1=0x64400003,reg9=0x4,reg10=0x9,reg11=0x2,reg13=0x3,reg14=0x1,reg15=0x2,metadata=0x4,in_port=4 (via action) data_len=1268 (unbuffered)
       userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.00.1c.00.18.00.20.00.40.00.00.00.00.00.01.de.10.80.00.2c.04.00.00.00.00.00.1c.00.18.00.20.00.60.00.00.00.00.00.01.de.10.80.00.2e.04.00.00.00.00.00.19.00.10.80.00.2a.02.00.01.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.25.00.00.00
      udp,vlan_tci=0x0000,dl_src=0a:58:64:40:00:03,dl_dst=00:00:00:00:00:00,nw_src=172.18.0.5,nw_dst=10.244.1.9,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,tp_src=48588,tp_dst=90 udp_csum:be5f
      2024-03-01T13:11:59.023Z|01343|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_PACKET_OUT (OF1.5) (xid=0x2414): in_port=CONTROLLER actions=set_field:0x64400001->reg0,set_field:0x64400003->reg1,set_field:0x4->reg9,set_field:0x9->reg10,set_field:0x2->reg11,set_field:0x3->reg13,set_field:0x1->reg14,set_field:0x2->reg15,set_field:0x4->metadata,set_field:ff:ff:ff:ff:ff:ff->eth_dst,move:NXM_NX_XXREG0[64..95]->NXM_OF_ARP_SPA[],move:NXM_NX_XXREG0[96..127]->NXM_OF_ARP_TPA[],set_field:1->arp_op,resubmit(,37) data_len=42
      arp,vlan_tci=0x0000,dl_src=0a:58:64:40:00:03,dl_dst=00:00:00:00:00:00,arp_spa=172.18.0.5,arp_tpa=10.244.1.9,arp_op=1,arp_sha=0a:58:64:40:00:03,arp_tha=00:00:00:00:00:00
      2024-03-01T13:11:59.023Z|01344|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: received: NXT_PACKET_IN2 (OF1.5) (xid=0x0): table_id=10 cookie=0x7622d0f9 total_len=42 reg0=0x64400001,reg1=0x64400003,reg9=0x8,reg11=0x2,reg13=0x7,reg14=0x2,metadata=0x4,in_port=ANY (via action) data_len=42 (unbuffered)
       userdata=00.00.00.01.00.00.00.00
      arp,vlan_tci=0x0000,dl_src=0a:58:64:40:00:01,dl_dst=0a:58:64:40:00:03,arp_spa=100.64.0.1,arp_tpa=100.64.0.3,arp_op=2,arp_sha=0a:58:64:40:00:01,arp_tha=0a:58:64:40:00:03
      2024-03-01T13:11:59.024Z|22136|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_BUNDLE_CONTROL (OF1.5) (xid=0x2415):
       bundle_id=0x5b2 type=OPEN_REQUEST flags=atomic ordered
      2024-03-01T13:11:59.024Z|01345|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_PACKET_OUT (OF1.5) (xid=0x2418): in_port=CONTROLLER actions=set_field:0x64400001->reg0,set_field:0x64400003->reg1,set_field:0x4->reg9,set_field:0x9->reg10,set_field:0x2->reg11,set_field:0x3->reg13,set_field:0x1->reg14,set_field:0x2->reg15,set_field:0x4->metadata,set_field:0->pkt_mark,resubmit(CONTROLLER,37) data_len=1268
      udp,vlan_tci=0x0000,dl_src=0a:58:64:40:00:03,dl_dst=0a:58:64:40:00:01,nw_src=172.18.0.5,nw_dst=10.244.1.9,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,tp_src=48588,tp_dst=90 udp_csum:be5f
      

      This packet in is only observed when it fails, but not when it doesn't. So there might be some kind of problem with how that request is being re-injected in the pipeline.

      Additionally, this might be easier to observe after the recent introduction of mac aging threshold since those ARPs might be triggered with more frequency.

      However, all is speculation right now. When trying to trace the pipeline, we ran into another potential OVS trace issue that is preventing us to properly troubleshoot this.

      An initial trace shows the packet in

      ❯ kubectl exec -ti -n ovn-kubernetes ovnkube-node-9bn6g -c ovnkube-controller -- bash -c "ovs-appctl ofproto/trace breth0 in_port=eth0,udp,dl_src=02:42:ac:12:00:05,dl_dst=02:42:ac:12:00:02,nw_src=172.18.0.5,nw_dst=172.18.0.2,nw_ttl=64,udp_src=33455,udp_dst=30501,dp_hash=1 --ct-next trk,new --ct-next trk,new --ct-next trk,new,dnat"
      ...
      Megaflow: recirc_id=0x54,ct_state=+new-est-rel-rpl-inv+trk+dnat,ct_mark=0/0x1,eth,udp,in_port=4,dl_src=02:42:ac:12:00:05,dl_dst=02:42:ac:12:00:02,nw_dst=10.244.1.0/24,nw_ttl=64,nw_frag=no
      Datapath actions: set(eth(src=0a:58:64:40:00:03,dst=00:00:00:00:00:00)),set(ipv4(ttl=63)),check_pkt_len(size=1414,gt(sample(sample=100.0%,actions(meter(3),userspace(pid=4294967295,controller(reason=1,dont_send=1,continuation=0,recirc_id=85,rule_cookie=0x5893bb55,controller_id=0,max_len=65535))))),le(sample(sample=100.0%,actions(meter(1),userspace(pid=4294967295,controller(reason=1,dont_send=1,continuation=0,recirc_id=86,rule_cookie=0x6c50f9f6,controller_id=0,max_len=65535))))))
      

      but then OVS fails on us when trying to trace the re-injection:

      ❯ kubectl exec -ti -n ovn-kubernetes ovnkube-node-9bn6g -c ovnkube-controller -- ovs-appctl ofproto/trace-packet-out br-int --ct-next trk,new,dnat "in_port=controller,udp,vlan_tci=0x0000,dl_src=0a:58:64:40:00:03,dl_dst=0a:58:64:40:00:01,nw_src=172.18.0.5,nw_dst=10.244.1.9,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,udp_src=48588,udp_dst=90" "set_field:0x64400001->reg0,set_field:0x64400003->reg1,set_field:0x4->reg9,set_field:0x9->reg10,set_field:0x2->reg11,set_field:0x3->reg13,set_field:0x1->reg14,set_field:0x2->reg15,set_field:0x4->metadata,set_field:0->pkt_mark,resubmit(controller,37)"
      Flow: udp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=0a:58:64:40:00:03,dl_dst=0a:58:64:40:00:01,nw_src=172.18.0.5,nw_dst=10.244.1.9,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,tp_src=48588,tp_dst=90
      ...
      thaw
           >>>> Recirculation conflict (actions)! <<<<
      

      Dumping the datapath flows is not helping either, no way to trace what actually happens after re-injection

      recirc_id(0),in_port(6),eth(src=0a:58:0a:f4:01:0e),eth_type(0x86dd),ipv6(frag=no),key32(00 00/00 00), packets:6, bytes:496, used:0.755s, actions:drop
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rpl-trk-dnat),eth(src=02:42:ac:12:00:05,dst=33:33:ff:12:00:05),eth_type(0x86dd),ipv6(src=::,dst=ff02::1:ff12:5,proto=58,hlimit=255,frag=no),key32(00 00/00 00),icmpv6(type=135,code=0),nd(target=fe80::42:acff:fe12:5,sll=00:00:00:00:00:00), packets:0, bytes:0, used:never, actions:4
      recirc_id(0xd8),dp_hash(0xf/0xf),in_port(3),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=2,mark=0xa/0xa,nat(dst=10.244.1.14:90)),recirc(0xd9)
      recirc_id(0xe4),in_port(6),ct_state(-new+est-rel+rpl-inv+trk+dnat),ct_mark(0/0x1),eth(src=02:42:ac:12:00:02,dst=02:42:ac:12:00:05),eth_type(0x0800),ipv4(src=10.244.1.0/255.255.255.0,dst=172.0.0.0/252.0.0.0,proto=17,frag=no),udp(src=0/0xc000), packets:0, bytes:0, used:never, actions:ct_clear,ct(commit,zone=64000,mark=0x1/0xffffffff),3
      recirc_id(0),in_port(4),skb_mark(0),eth(src=02:42:ac:12:00:02,dst=02:42:ac:12:00:05),eth_type(0x0806), packets:0, bytes:0, used:never, actions:3
      recirc_id(0),in_port(6),ct_state(-new-est-rel-rpl-inv-trk-dnat),ct_mark(0/0xf),eth(src=0a:58:0a:f4:01:0e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.244.1.14,tip=10.244.1.1,op=1/0xff,sha=0a:58:0a:f4:01:0e,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, actions:userspace(pid=4294967295,slow_path(action))
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rel-rpl-inv-trk-dnat),ct_mark(0/0xf),eth(src=02:42:ac:12:00:05,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::/ffc0::,dst=ff02::2,proto=58,hlimit=255,frag=no),key32(00 00/00 00),icmpv6(type=133,code=0), packets:0, bytes:0, used:never, actions:4
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rpl-trk-dnat),eth(src=02:42:ac:12:00:05,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=fe80::/ffc0::,dst=ff02::16,proto=58,hlimit=1,frag=no),key32(00 00/00 00),icmpv6(type=143), packets:1, bytes:110, used:1.331s, actions:4
      recirc_id(0x31),in_port(3),ct_state(-est-rel+trk),ct_mark(0),eth(dst=02:42:ac:12:00:02),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:849, used:0.046s, flags:SP., actions:4
      recirc_id(0x31),in_port(3),ct_state(+est+trk),ct_mark(0x2),eth(),eth_type(0x0800),ipv4(frag=no), packets:74359, bytes:11775341, used:0.031s, flags:SFPR., actions:4
      recirc_id(0),in_port(3),skb_mark(0),eth(dst=02:42:ac:12:00:02),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=32768/0x8000), packets:1056, bytes:839062, used:0.523s, flags:SP., actions:ct(zone=64000,nat),recirc(0x31)
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rpl-trk-dnat),eth(src=02:42:ac:12:00:05,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=::/ffc0::,dst=ff02::16,proto=58,hlimit=1,frag=no),key32(00 00/00 00),icmpv6(type=143), packets:1, bytes:110, used:2.739s, actions:4
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rel-rpl-inv-trk-dnat),ct_mark(0/0x1),eth(src=02:42:ac:12:00:05,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.18.0.5,tip=172.18.0.2,op=1/0xff,sha=02:42:ac:12:00:05,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, actions:userspace(pid=4294967295,slow_path(action))
      recirc_id(0),in_port(4),skb_mark(0),eth(),eth_type(0x0800),ipv4(dst=172.0.0.0/252.0.0.0,proto=6,frag=no), packets:104957, bytes:544825288, used:0.031s, flags:SFPR., actions:ct(commit,zone=64000,mark=0x2/0xffffffff),3
      recirc_id(0xe3),in_port(6),ct_state(-new+est-rel+rpl-inv+trk+dnat),ct_mark(0/0xf),eth(src=0a:58:64:40:00:01,dst=0a:58:64:40:00:03),eth_type(0x0800),ipv4(dst=172.18.0.5,proto=17,ttl=63,frag=no), packets:0, bytes:0, used:never, actions:set(eth(src=02:42:ac:12:00:02,dst=02:42:ac:12:00:05)),set(ipv4(ttl=62)),ct(zone=2,nat),recirc(0xe4)
      recirc_id(0),in_port(6),eth(src=0a:58:0a:f4:01:0e,dst=0a:58:0a:f4:01:01),eth_type(0x0800),ipv4(src=10.244.1.14,dst=64.0.0.0/192.0.0.0,proto=17,frag=no),udp(src=0/0xfe00), packets:0, bytes:0, used:never, actions:ct(zone=19,nat),recirc(0xe1)
      recirc_id(0xe1),in_port(6),ct_state(-new+est-rel+rpl-inv+trk-dnat),ct_mark(0/0xf),eth(src=0a:58:0a:f4:01:0e,dst=0a:58:0a:f4:01:01),eth_type(0x0800),ipv4(src=10.244.1.8/255.255.255.248,dst=100.64.0.3,proto=17,ttl=64,frag=no),udp(src=0/0xfe00), packets:0, bytes:0, used:never, actions:ct_clear,set(eth(src=0a:58:64:40:00:01,dst=0a:58:64:40:00:03)),set(ipv4(ttl=63)),check_pkt_len(size=1414,gt(sample(sample=100.0%,actions(meter(4),userspace(pid=4294967295,controller(reason=1,dont_send=0,continuation=0,recirc_id=226,rule_cookie=0x7c33bd67,controller_id=0,max_len=65535))))),le(ct(nat),recirc(0xe3)))
      recirc_id(0xd6),in_port(3),eth(),eth_type(0x0800),ipv4(dst=172.18.0.2,frag=no), packets:0, bytes:0, used:never, actions:ct(zone=2,nat),recirc(0xd7)
      recirc_id(0),in_port(3),skb_mark(0),ct_state(-new-est-rpl-trk-dnat),eth(src=02:42:ac:12:00:05,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=172.18.0.5,tip=172.18.0.1,op=1/0xff,sha=02:42:ac:12:00:05), packets:0, bytes:0, used:never, actions:4
      recirc_id(0),in_port(3),ct_state(-new-est-rpl-trk-dnat),eth(src=02:42:ac:12:00:05,dst=02:42:ac:12:00:02),eth_type(0x0800),ipv4(src=172.18.0.5,dst=172.18.0.2,proto=17,ttl=64,frag=no),udp(dst=30261), packets:0, bytes:0, used:never, actions:ct(nat),recirc(0xd6)
      recirc_id(0),in_port(3),skb_mark(0),eth(dst=02:42:ac:12:00:02),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=8192/0xe000), packets:67, bytes:9683, used:0.031s, flags:SP., actions:ct(zone=64000,nat),recirc(0x31)
      recirc_id(0xd9),in_port(3),ct_state(+new-est-rel-rpl-inv+trk+dnat),ct_mark(0/0x1),eth(src=02:42:ac:12:00:05,dst=02:42:ac:12:00:02),eth_type(0x0800),ipv4(dst=10.244.1.0/255.255.255.0,proto=17,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:set(eth(src=0a:58:64:40:00:03,dst=0a:58:64:40:00:01)),set(ipv4(ttl=63)),check_pkt_len(size=1414,gt(sample(sample=100.0%,actions(meter(4),userspace(pid=4294967295,controller(reason=1,dont_send=0,continuation=0,recirc_id=229,rule_cookie=0x5893bb55,controller_id=0,max_len=65535))))),le(ct(zone=2,nat),recirc(0xe6)))
      recirc_id(0xd7),in_port(3),ct_state(+new-rel+trk),eth(),eth_type(0x0800),ipv4(dst=172.18.0.2,proto=17,frag=no),udp(dst=30261), packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0xd8)
      

      I have opened an issue to OVS because of the trace problem. You will find DBs attached there
      https://issues.redhat.com/browse/FDP-438

              amusil@redhat.com Ales Musil
              jcaamano@redhat.com Jaime Caamaño Ruiz
              Ying Xu Ying Xu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: