• Icon: Sub-task Sub-task
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • None
    • openvswitch3.5
    • None
    • 0
    • False
    • False
    • openvswitch3.5-3.5.2-50.el9fdp
    • rhel-9
    • rhel-net-ovs-dpdk
    • ssg_networking
    • OVS/DPDK - Sprint 9 - East, OVS/DPDK - Sprint 10 - East
    • 2

       Problem Description: Clearly explain the issue.

      This issue has been identified in a OCP Virt deployment.

      A CP node deployed as a virtual machine had no connectivity with another local virtual machine.
      The hypervisor networking involved a simple bridging of tap interfaces.

      /--------------------\    /-------------------------\    /----------\                               
      | CP virtual machine |    |       Linux host        |    | Other VM |                               
      |                    |    |                         |    |          |                               
      |   br-ex -- dpdk0 --+----+-- tap6 -- br0 -- tap3 --+----+--  eth0  |                               
      |                    |    |                         |    |          |                               
      \--------------------/    \-------------------------/    \----------/                               
      

      In CP virtual machine, OVS DPDK is installed and configured:

      • dpdk0 is a dpdk port using the virtio net PCI device that needs to be bound to vfio-pci,
      • br-ex is a userspace type bridge,
      • userspace TSO is enabled,

      In the hypervisor Linux host, br0 is simply a kernel standard bridge. No DPDK, No OVS involved in the host.

      In Other VM, eth0 is a simple virtio net PCI device bound to the kernel driver. No DPDK involved in this virtual machine.

      Traffic is sent from CP VM br-ex, to Other VM eth0 iface on the 192.168.158.0/24 subnet.

      Looking (from the hypervisor side) at the packets received and transmitted showed that the IP/TCP traffic sent by the CP node had wrong IP and TCP checksums.

      SYN packet (all good) sent by local vm (tap3) to CP node (tap6):

      14:05:40.527081 tap3  P   ifindex 14 52:54:00:aa:bb:13 ethertype IPv4 (0x0800), length 80: (tos 0x10, ttl 64, id 51169, offset 0, flags [DF], proto TCP (6), length 60)                                                                       
          192.168.158.28.56904 > 192.168.158.30.22: Flags [S], cksum 0xbdba (incorrect -> 0xb5fd), seq 866637909, win 64240, options [mss 1460,sackOK,TS val 3082041205 ecr 0,nop,wscale 7], length 0                                               
      14:05:40.527088 tap6  Out ifindex 21 52:54:00:aa:bb:13 ethertype IPv4 (0x0800), length 80: (tos 0x10, ttl 64, id 51169, offset 0, flags [DF], proto TCP (6), length 60)                                                                       
          192.168.158.28.56904 > 192.168.158.30.22: Flags [S], cksum 0xb5fd (correct), seq 866637909, win 64240, options [mss 1460,sackOK,TS val 3082041205 ecr 0,nop,wscale 7], length 0                                                           
      

      SYN+ACK reply packet (IP csum == 0 !) sent by CP node (tap6) to local vm (tap3):

      14:05:40.528387 tap6  P   ifindex 21 52:54:00:aa:bb:0e ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->7d30)!)                                                     
          192.168.158.30.22 > 192.168.158.28.56904: Flags [S.], cksum 0xbdba (incorrect -> 0x40ab), seq 293020318, ack 866637910, win 65160, options [mss 1460,sackOK,TS val 3141108314 ecr 3082041205,nop,wscale 7], length 0                      
      14:05:40.528394 tap3  Out ifindex 14 52:54:00:aa:bb:0e ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->7d30)!)                                                     
          192.168.158.30.22 > 192.168.158.28.56904: Flags [S.], cksum 0xbdba (incorrect -> 0x40ab), seq 293020318, ack 866637910, win 65160, options [mss 1460,sackOK,TS val 3141108314 ecr 3082041205,nop,wscale 7], length 0        
      

      I suspect the problem is only noticed as the backend in the hypervisor is vhost-net and not vhost-user.

      A workaround is disabling all kind of checksum offloads at the virtio level (in the vm xml).

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      All versions of OVS affected.

              rhn-support-dmarchan David Marchand
              rhn-support-dmarchan David Marchand
              Ting Li Ting Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: