Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30909

Failed to start testpmd traffic on Pods with BCM57508 dpdk vfio-pci interface

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Undefined
    • None
    • 4.15, 4.16
    • Networking / SR-IOV
    • None
    • No
    • CNF Network Sprint 251
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Create dpdk sriov policy on BCM57508 nic, create sriovnetwork and testpod with dpdk vf as secondary interface, and start testpmd traffic. Sending traffic may fail      

      Version-Release number of selected component (if applicable):

         4.15

      How reproducible:

          not always

      Steps to Reproduce:

          1.create dpdk sriovnetworknodepolicy
      {
          "kind": "List",
          "apiVersion": "v1",
          "metadata": {},
          "items": [
              {
                  "apiVersion": "sriovnetwork.openshift.io/v1",
                  "kind": "SriovNetworkNodePolicy",
                  "metadata": {
                      "name": "bcm57508",
                      "namespace": "openshift-sriov-network-operator"
                  },
                  "spec": {
                      "deviceType": "vfio-pci",
                      "nicSelector": {
                          "deviceID": "1750",
                          "pfNames": [
                              "ens3f0np0"
                          ],
                          "vendor": "14e4"
                      },
                      "nodeSelector": {
                          "feature.node.kubernetes.io/sriov-capable": "true"
                      },
                      "numVfs": 2,
                      "resourceName": "bcm57508"
                  }
              }
          ]
      }     
      
          2. create sriovnetwork
      {
          "kind": "List",
          "apiVersion": "v1",
          "metadata": {},
          "items": [
              {
                  "apiVersion": "sriovnetwork.openshift.io/v1",
                  "kind": "SriovNetwork",
                  "metadata": {
                      "name": "bcm57508dpdknet",
                      "namespace": "openshift-sriov-network-operator"
                  },
                  "spec": {
                      "ipam": "{\n  \"type\": \"whereabouts\",\n  \"range\":\"10.30.0.0/30\"\n}\n",
                      "linkState": "disable",
                      "maxTxRate": 0,
                      "minTxRate": 0,
                      "networkNamespace": "e2e-69582-bcm57508",
                      "resourceName": "bcm57508",
                      "spoofChk": "off",
                      "trust": "on",
                      "vlan": 0,
                      "vlanQoS": 0
                  }
              }
          ]
      }
      
           3. create namesapce and testpod
      # more testpod-dpdk.yaml 
      apiVersion: v1
      kind: Pod
      metadata:
        name: testpod-dpdk
        annotations:
          k8s.v1.cni.cncf.io/networks: bcm57508dpdknet, bcm57508dpdknet
      spec:
        containers:
        - name: debug-network-pod
          image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.10.0-5
          imagePullPolicy: IfNotPresent
          securityContext:
            runAsUser: 0
            capabilities:
              add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
          volumeMounts:
          - mountPath: /dev/hugepages
            name: hugepage
          resources:
            limits:
              memory: "1Gi"
              cpu: "2"
              hugepages-1Gi: "4Gi"
            requests:
              memory: "1Gi"
              cpu: "2"
              hugepages-1Gi: "4Gi"
          command: ["sleep", "infinity"]
        volumes:
        - name: hugepage
          emptyDir:
            medium: HugePages
      
      
         4. sending testpmd traffic on testpod, it reports 'Fail to start port 0' and 'Fail to start port 1'
      
      [root@openshift-qe-028 ~]# oc rsh testpod-dpdk 
      sh-4.4# testpmd -l 4,5,6 --in-memory -w 0000:5f:00.0 -w 0000:5f:00.1 --socket-mem 1024 -n 4 -- --nb-cores=2 --auto-start --stats-period 10 --rxd=1024 --txd=1024
      EAL: Detected 64 lcore(s)
      EAL: Detected 2 NUMA nodes
      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
      EAL: Selected IOVA mode 'VA'
      EAL: No available hugepages reported in hugepages-2048kB
      EAL: No free hugepages reported in hugepages-2048kB
      EAL: No free hugepages reported in hugepages-2048kB
      EAL: No available hugepages reported in hugepages-2048kB
      EAL: Probing VFIO support...
      EAL: VFIO support initialized
      EAL:   using IOMMU type 1 (Type 1)
      EAL: Probe PCI driver: net_bnxt (14e4:1806) device: 0000:5f:00.0 (socket 0)
      bnxt_hwrm_parent_pf_qcfg(): error 4:0:00000000:0199
      EAL: Probe PCI driver: net_bnxt (14e4:1806) device: 0000:5f:00.1 (socket 0)
      bnxt_hwrm_parent_pf_qcfg(): error 4:0:00000000:0199
      EAL: No legacy callbacks, legacy socket not created
      Auto-start selected
      testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
      testpmd: preferred mempool ops selected: ring_mp_mc
      testpmd: create a new mbuf pool <mb_pool_1>: n=163456, size=2176, socket=1
      testpmd: preferred mempool ops selected: ring_mp_mc
      Configuring Port 0 (socket 0)
      bnxt_hwrm_send_message(): Error(timeout) sending msg 0x0027, seq_id 38
      bnxt_hwrm_port_phy_qcfg(): failed rc:-110
      bnxt_get_hwrm_link_config(): Get link config failed with rc -110
      bnxt_update_phy_setting(): Failed to get link settings
      Fail to start port 0
      Configuring Port 1 (socket 0)
      bnxt_hwrm_send_message(): Error(timeout) sending msg 0x0027, seq_id 38
      bnxt_hwrm_port_phy_qcfg(): failed rc:-110
      bnxt_get_hwrm_link_config(): Get link config failed with rc -110
      bnxt_update_phy_setting(): Failed to get link settings
      Fail to start port 1
      Please stop the ports first
      Done
      No commandline core given, start packet forwarding
      Not all ports were started   
      
      
      5. meanwhile, using another card, like x710, using the same configuration, traffic can be sent successfully.
      
      sh-4.4# testpmd -l 4,5,6 --in-memory -w 0000:5e:01.0 -w 0000:5e:01.1 --socket-mem 1024 -n 4 -- --nb-cores=2 --auto-start --stats-period 10 --rxd=1024 --txd=1024
      EAL: Detected 64 lcore(s)
      EAL: Detected 2 NUMA nodes
      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
      EAL: Selected IOVA mode 'VA'
      EAL: No available hugepages reported in hugepages-2048kB
      EAL: No free hugepages reported in hugepages-2048kB
      EAL: No free hugepages reported in hugepages-2048kB
      EAL: No available hugepages reported in hugepages-2048kB
      EAL: Probing VFIO support...
      EAL: VFIO support initialized
      EAL:   using IOMMU type 1 (Type 1)
      EAL: Probe PCI driver: net_iavf (8086:1889) device: 0000:5e:01.0 (socket 0)
      EAL: Probe PCI driver: net_iavf (8086:1889) device: 0000:5e:01.1 (socket 0)
      EAL: No legacy callbacks, legacy socket not created
      Auto-start selected
      testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
      testpmd: preferred mempool ops selected: ring_mp_mc
      testpmd: create a new mbuf pool <mb_pool_1>: n=163456, size=2176, socket=1
      testpmd: preferred mempool ops selected: ring_mp_mc
      Configuring Port 0 (socket 0)
      iavf_init_rss(): RSS is enabled by PF by default
      iavf_configure_queues(): request RXDID[22] in Queue[0]
      
      
      Port 0: link state change event
      
      
      Port 0: link state change event
      Port 0: FE:2C:A2:34:64:6D
      Configuring Port 1 (socket 0)
      iavf_init_rss(): RSS is enabled by PF by default
      iavf_configure_queues(): request RXDID[22] in Queue[0]
      
      
      Port 1: link state change event
      
      
      Port 1: link state change event
      Port 1: AE:17:A1:83:1B:21
      Checking link statuses...
      Done
      No commandline core given, start packet forwarding
      io packet forwarding - ports=2 - cores=2 - streams=2 - NUMA support enabled, MP allocation mode: native
      Logical Core 5 (socket 1) forwards packets on 1 streams:
        RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
      Logical Core 6 (socket 0) forwards packets on 1 streams:
        RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
      
      
        io packet forwarding packets/burst=32
        nb forwarding cores=2 - nb forwarding ports=2
        port 0: RX queue number: 1 Tx queue number: 1
          Rx offloads=0x0 Tx offloads=0x10000
          RX queue: 0
            RX desc=1024 - RX free threshold=32
            RX threshold registers: pthresh=0 hthresh=0  wthresh=0
            RX Offloads=0x0
          TX queue: 0
            TX desc=1024 - TX free threshold=32
            TX threshold registers: pthresh=0 hthresh=0  wthresh=0
            TX offloads=0x10000 - TX RS bit threshold=32
        port 1: RX queue number: 1 Tx queue number: 1
          Rx offloads=0x0 Tx offloads=0x10000
          RX queue: 0
            RX desc=1024 - RX free threshold=32
            RX threshold registers: pthresh=0 hthresh=0  wthresh=0
            RX Offloads=0x0
          TX queue: 0
            TX desc=1024 - TX free threshold=32
            TX threshold registers: pthresh=0 hthresh=0  wthresh=0
            TX offloads=0x10000 - TX RS bit threshold=32
      
      
      
      Port statistics ====================================
        ######################## NIC statistics for port 0  ########################
        RX-packets: 1215396    RX-missed: 0          RX-bytes:  403947212
        RX-errors: 0
        RX-nombuf:  0         
        TX-packets: 1229992    TX-errors: 0          TX-bytes:  413538192
      
      
        Throughput (since last show)
        Rx-pps:       121288          Rx-bps:    322489552
        Tx-pps:       122744          Tx-bps:    330146472
        ############################################################################
      
      
        ######################## NIC statistics for port 1  ########################
        RX-packets: 1231688    RX-missed: 0          RX-bytes:  409356916
        RX-errors: 0
        RX-nombuf:  0         
        TX-packets: 1217279    TX-errors: 0          TX-bytes:  409267416
      
      
        Throughput (since last show)
        Rx-pps:       122914          Rx-bps:    326808336
        Tx-pps:       121476          Tx-bps:    326736888
        ############################################################################
      ^C

      Actual results:

          fail to start traffic on dpdk vf port

      Expected results:

          start traffic on dpdk vf port succeed.

      Additional info:

      # oc version
      Client Version: 4.15.0-0.nightly-2024-03-09-040926
      Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
      Server Version: 4.15.0-0.nightly-2024-03-09-040926
      Kubernetes Version: v1.28.7+6e2789b    
      
      [root@openshift-qe-028 sriov]# oc get csv -n openshift-sriov-network-operator 
      NAME                                          DISPLAY                   VERSION               REPLACES   PHASE
      metallb-operator.v4.15.0-202403081009         MetalLB Operator          4.15.0-202403081009              Succeeded
      sriov-network-operator.v4.15.0-202403050707   SR-IOV Network Operator   4.15.0-202403050707              Succeeded
      [root@openshift-qe-028 sriov]# 

      Attachments

        Activity

          People

            sscheink@redhat.com Sebastian Scheinkman
            rhn-support-yingwang Ying Wang
            Zhanqi Zhao Zhanqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: