Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18901

Unable to pass traffic between pods with Hardware Offload

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 4.12.z
    • Networking / SR-IOV
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Initially noticed an issue where all the VFs were shown zero'd out, but we understand this is expected. https://issues.redhat.com/browse/RHEL-3398 has been filed to check with Nvidia on why this is happening. 
      
      Regardless of that, we are still unable to pass traffic between pods with Hardware Offload. There is an observation that MTU is set improperly. there is a question about link-state. There is also some question around documentation and instructions. 
      
      YAML files used for config - additional configs attached. 
      
      
      sriov-pool-config.yaml :
       
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkPoolConfig
      metadata:
        name: offload-sriovnetworkpoolconfig
        namespace: openshift-sriov-network-operator
      spec:
        ovsHardwareOffloadConfig:
          name: mcp-offloading 
      
      
      mcp-offloading.yaml :
       
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: mcp-offloading 
      spec:
        machineConfigSelector:
          matchExpressions:
            - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]} 
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/mcp-offloading: "" 
      
      Attached Are The Sriovnetworknodepolicies And Network Attachment Definitions
      
      
      I followed the directions found at [Configuring hardware offloading - Hardware networks | Networking | OpenShift Container Platform 4.12|https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html] for configuring one of our NICs (Mellanox ConnectX-6 Dx) for hardware offloading. After the configuration is complete I see the VFs for the configured network all have zeroed out MAC addresses as seen below:
       
      10: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether b8:ce:f6:91:b2:38 brd ff:ff:ff:ff:ff:ff
          vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 8     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 10     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 11     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 12     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 13     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 14     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 15     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 16     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 17     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 18     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 19     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 20     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 21     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 22     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 23     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 24     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 25     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 26     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 27     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 28     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
          vf 29     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
       
      The other items of note are that the mtu is expected to be 9000 based on the SriovNetworkNodePolicy for this interface, and the link-state was set to auto rather than disable. Previous to the configuration for hardware offloading, the interface in question had populated MAC addresses and mtu 9000. For example:
       
      11: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether b8:ce:f6:91:b2:39 brd ff:ff:ff:ff:ff:ff
          vf 0     link/ether b2:d3:6a:8c:92:86 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 1     link/ether f6:b8:24:46:5d:d9 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 2     link/ether 4e:65:6d:b5:20:3d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 3     link/ether ca:57:a0:0a:b6:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 4     link/ether 82:43:95:e9:31:ea brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 5     link/ether 3a:55:fa:97:94:d4 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 6     link/ether 9e:fa:a4:15:7a:ee brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 7     link/ether 4e:15:b7:14:55:f2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 8     link/ether ba:5f:6e:d1:f3:35 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
          vf 9     link/ether 36:af:99:b9:f8:f0 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
          vf 10     link/ether c2:45:c2:c9:91:02 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 11     link/ether b2:4e:7b:93:ac:17 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 12     link/ether 0e:a9:88:8a:d0:06 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 13     link/ether b2:0a:7a:f8:12:f7 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 14     link/ether 4e:ae:72:94:76:e5 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 15     link/ether 32:7a:18:28:bb:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 16     link/ether f6:0d:78:ba:db:25 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 17     link/ether ba:c1:97:e8:4c:fd brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 18     link/ether aa:b5:16:54:88:ea brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 19     link/ether fe:2e:ee:82:83:ec brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 20     link/ether 2a:35:c4:d6:11:83 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
         vf 21     link/ether 9a:c7:50:6a:3e:0d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 22     link/ether 82:b9:1f:4e:c1:e3 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
         vf 23     link/ether 96:48:1d:12:bc:70 brd ff:ff:ff:ff:ff:ff, vlan 181, spoof checking off, link-state auto, trust on, query_rss off
          vf 24     link/ether 86:74:76:dd:fb:59 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 25     link/ether 52:03:44:f1:64:15 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 26     link/ether 02:8c:bd:9b:4f:af brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
          vf 27     link/ether 1a:09:ce:87:13:2c brd ff:ff:ff:ff:ff:ff, vlan 3216, spoof checking off, link-state auto, trust on, query_rss off
          vf 28     link/ether d6:e1:4f:33:bc:3b brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
          vf 29     link/ether 62:03:53:f6:2f:c0 brd ff:ff:ff:ff:ff:ff, vlan 3611, spoof checking off, link-state auto, trust on, query_rss off
       
      I have been able to deploy pods using the network interface, but the attached interface on the pod side is showing NO-CARRIER with a DOWN state, which is not expected as far as I know. See below:
       
      101: net1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
          link/ether e2:4e:9c:1a:ad:0e brd ff:ff:ff:ff:ff:ff
          inet 10.1.11.2/24 brd 10.1.11.255 scope global net1
             valid_lft forever preferred_lft forever
       
      The cluster is running  OCP 4.12.30 and the driver and firmware information is below:
       
      root@greig-w1-ens1f0-mlx5-client:/# ethtool -i net1
      driver: mlx5_core
      version: 4.18.0-372.69.1.el8_6.x86_64
      firmware-version: 22.35.2000 (MT_0000000359)
      expansion-rom-version:
      bus-info: 0000:3b:00.7
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: no
      supports-priv-flags: yes
       
      NIC information:
       
      [core@worker1 ~]$ lspci -nn -v -s 0000:3b:00.0
      3b:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]
              Subsystem: Mellanox Technologies Device [15b3:0016]
              Flags: bus master, fast devsel, latency 0, IRQ 173, NUMA node 0, IOMMU group 65
              Memory at ae000000 (64-bit, prefetchable) [size=32M]
              Expansion ROM at ab000000 [disabled] [size=1M]
              Capabilities: <access denied>
              Kernel driver in use: mlx5_core
              Kernel modules: mlx5_core
       
      I considered perhaps it was a firmware compatibility issue, so I tried to upgrade to 24.35.1012 (MT_0000000540) based on information found at [Network Adapter Fast Datapath Feature Support Matrix - Red Hat Customer Portal|https://access.redhat.com/articles/3538141#tested-firmware-2], but could not find that version in the Nvidia firmware download page.
       
      In the documentation for adding the annotation to the pod it gives instructions under the heading: [Adding the network attachment definition to your pods|https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html#adding-network-attachment-definition-to-pods_configuring-hardware-offloading]
      
      
      The part that I'm hung up on is that in the text after the bullet point it says to "add the .metadata.annotations.k8s.v1.cni.cncf.io/networks field", but in the example code the annotation uses "v1.multus-cni.io/default-network". Is that difference an error or is there something that I'm missing?
      
      When I have tried using the v1.multus-cni.io/default-network annotation, the pod creation is successful, but then the net1 interface that I'm expecting is not created when the pod is created. If I use the .metadata.annotations.k8s.v1.cni.cncf.io/networks  annotation, the network interface net1 is created in the pod, but I am unable to ping from one pod to another and net1 shows a NO_CARRIER state. I feel like there is something I'm missing, but I don't know what it is.
       
      Also when I use the network type "ovn-k8s-cni-overlay" as given in the network attachment definition, the pod creation fails with the below error:
       
      Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_greig-w1-ens1f0-mlx5-u2004-client_spk-test-automation_6d32f0ab-4e2e-4a4c-b6a9-d6933faeb969_0(022e338885656755f36bdfeb880239e67c9eef53bb7b6d7a6a00ac0d8c946b1f): error adding pod spk-test-automation_greig-w1-ens1f0-mlx5-u2004-client to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [spk-test-automation/greig-w1-ens1f0-mlx5-u2004-client/6d32f0ab-4e2e-4a4c-b6a9-d6933faeb969:w1-ens1f0-mlx5-netdev-180]: error adding container to network "w1-ens1f0-mlx5-netdev-180": SRIOV-CNI failed to configure VF "failed to set vf 15 vlan configuration: operation not supported"
       
      It only seems to work if I use "sriov" as the network type, but then I am running into the issues pinging the pods that I previously mentioned.
      
      

      Version-Release number of selected component (if applicable):

      4.12.30
      
      

      How reproducible:

      Always
      
      

      Steps to Reproduce:

      1. https://docs.openshift.com/container-platform/4.12/networking/hardware_networks/configuring-hardware-offloading.html
      
      

      w1-ens1f0-mlx5-netdev-policy2.yaml:

      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
        name: w1-ens1f0-mlx5-netdev-policy2
        namespace: openshift-sriov-network-operator
      spec:
        resourceName: W1Ens1f0Mlx5NetdevPolicy2
        nodeSelector:
          kubernetes.io/hostname: "worker1.ndal-ocp3.ve.pd.f5net.com"
        numVfs: 30
        mtu: 9000
        nicSelector:
          vendor: "15b3"
          name: "ens1f0"
          deviceID: "101d"
          rootDevices: ["0000:3b:00.0"]
          pfNames: ["ens1f0#16-29"]
        deviceType: "netdevice"
        isRdma: true
        eSwitchMode: "switchdev"
      
      

      w1-ens1f0-mlx5-netdev-policy1.yaml

      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
        name: w1-ens1f0-mlx5-netdev-policy1
        namespace: openshift-sriov-network-operator
      spec:
        resourceName: W1Ens1f0Mlx5NetdevPolicy1
        nodeSelector:
          kubernetes.io/hostname: "worker1.ndal-ocp3.ve.pd.f5net.com"
        numVfs: 30
        mtu: 9000
        nicSelector:
          vendor: "15b3"
          name: "ens1f0"
          deviceID: "101d"
          rootDevices: ["0000:3b:00.0"]
          pfNames: ["ens1f0#2-15"]
        deviceType: "netdevice"
        isRdma: true
        eSwitchMode: "switchdev"
      
      

      w1-ens1f0-mlx5-2-net-attach-def.yaml:

      apiVersion: "k8s.cni.cncf.io/v1"
      kind: NetworkAttachmentDefinition
      metadata:
        name: w1-ens1f0-policy2-net-attach-def
        namespace: spk-test-automation 
        annotations:
          k8s.v1.cni.cncf.io/resourceName: openshift.io/W1Ens1f0Mlx5NetdevPolicy2 
      spec:
        config: '{"cniVersion":"0.3.1",
                  "name":"w1-ens1f0-policy2-net-attach-def",
                  "type":"sriov",
                  "ipam":{"type": "host-local", "subnet": "10.1.12.0/24"},
                  "dns":{}
                }'
      
      
      w1-ens1f0-mlx5-1-net-attach-def.yaml:{code:none}
      
      apiVersion: "k8s.cni.cncf.io/v1"
      kind: NetworkAttachmentDefinition
      metadata:
        name: w1-ens1f0-policy1-net-attach-def
        namespace: spk-test-automation 
        annotations:
          k8s.v1.cni.cncf.io/resourceName: openshift.io/W1Ens1f0Mlx5NetdevPolicy1
      spec:
        config: '{"cniVersion":"0.3.1",
                  "name":"w1-ens1f0-policy1-net-attach-def",
                  "type":"sriov",
                  "ipam":{"type": "host-local", "subnet": "10.1.11.0/24"},
                  "dns":{}
                }'
      
      

            bnemeth@redhat.com Balazs Nemeth
            kholtz_rhn Ken Holtz
            Weibin Liang Weibin Liang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: