Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-59015

i40e VF ports get a different mac everytime DPDK probes it

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.2.0
    • dpdk
    • No
    • Important
    • rhel-sst-network-fastdatapath
    • ssg_networking
    • 2
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      As we are closing the FD product, this is a clone of the original report at https://issues.redhat.com/browse/FD-3273.

      """
      Hello team,

      Hope you are doing well !

      One of the Customer is facing below issue on their single node ocp clusters:

      Whenever testpmd or another DPDK application is started in a pod, the attached VFs' MAC addresses gets changed. This does not happen for all the vf’s but kind of intermittent issue and it goes away when redeploying the sriovnetworknodepolicy for the affected vf’s(this will redeploy the vf).

      Example:
      vf 0 link/ether 16:8f:9a:b2:a6:a4 brd ff:ff:ff:ff:ff:ff, spoof checking
      on, link-state auto, trust off
      vf 1 link/ether 42:89:5a:b2:d0:23 brd ff:ff:ff:ff:ff:ff, spoof checking
      on, link-state auto, trust off

      create the pod using the above vf's without starting any application:
      $ oc get po
      NAME READY STATUS RESTARTS AGE
      testpmd 1/1 Running 0 5s

      vf 0 link/ether 16:8f:9a:b2:a6:a4 brd ff:ff:ff:ff:ff:ff, spoof checking
      off, link-state auto, trust off
      vf 1 link/ether 42:89:5a:b2:d0:23 brd ff:ff:ff:ff:ff:ff, spoof checking
      off, link-state auto, trust off

      No mac address change until now.

      start the dpdk test-pmd:

      oc exec testpmd – testpmd -l 4,40 -a "0000:3b:02.0" -a
      "0000:3b:02.1" – --nb-cores=1 --forward-mode=mac
      --rxd=2048 --txd=2048 -i --eth-peer=0,"50:00:00:03" --eth-peer=1,"50:00:00:04"
      cutting down
      Port 0: link state change event
      Port 0: link state change event
      iavf_set_rx_function(): request RXDID[1] in Queue[0] is legacy, set rx_pkt_burst as legacy for all queues
      Port 0: AA:24:55:24:24:0A
      Configuring Port 1 (socket 0)
      Port 1: link state change event
      Port 1: link state change event
      Port 1: 6A:E1:8D:26:0F:49
      Checking link statuses...
      testpmd> Done

      vf 0 link/ether aa:24:55:24:24:0a brd
      ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off
      vf 1 link/ether 6a:e1:8d:26:0f:49 brd ff:ff:ff:ff:ff:ff, spoof checking
      off, link-state auto, trust off

      change in mac address can be seen as above.

      Additional info:

      working vf's where no mac address change seen(example vf 3 and vf 4):

      cat dmesg|grep -i 0000:3b:02.3|grep -i mac

      [192.326447] iavf 0000:3b:02.3: Invalid MAC address 00:00:00:00:00:00,
      using random [ 192.332807] iavf 0000:3b:02.3: MAC address: 2a:1a:82:36:cb:f9

      cat dmesg|grep -i 0000:3b:02.4|grep -i mac

      [192.330453] iavf 0000:3b:02.4: Invalid MAC address 00:00:00:00:00:00,
      using random [ 192.332817] iavf 0000:3b:02.4: MAC address: 1e:e1:d8:ab:09:26

      Non-working(vf 0 and vf 1):no logs regarding mac set

      cat dmesg|grep -i 0000:3b:02.0|grep -i mac

      cat dmesg|grep -i 0000:3b:02.1|grep -i mac

      Note:
      they are using vfio-pci driver for above vf’s.

      $ oc logs sriov-network-config-daemon-282tq -n
      openshift-sriov-network-operator|grep -e 0000:3b:02.0 -e 0000:3b:02.1
      -e 0000:3b:02.3 -e 0000:3b:02.4

      {Name: Mac: Assigned: Driver:vfio-pci PciAddress:0000:3b:02.0 Vendor:8086 DeviceID:154c Vlan:0 Mtu:0 VfID:0} {Name: Mac: Assigned: Driver:vfio-pci PciAddress:0000:3b:02.1 Vendor:8086 DeviceID:154c Vlan:0 Mtu:0 VfID:1} {Name: Mac: Assigned: Driver:vfio-pci PciAddress:0000:3b:02.3 Vendor:8086 DeviceID:154c Vlan:0 Mtu:0 VfID:3} {Name: Mac: Assigned: Driver:vfio-pci PciAddress:0000:3b:02.4 Vendor:8086 DeviceID:154c Vlan:0 Mtu:0 VfID:4}

      The vf's belong to pf ens1f0:

      cat ethtool_-i_ens1f0

      driver: i40e
      version: 5.14.0-284.45.1.rt14.330.el9_2.
      firmware-version: 9.20 0x8000d960 22.0.9

      Note: same issue seen on multiple sno cluster.

      we also tested with latest dpdk from upstream(23.11) and behavior is same
      when using the mentioned vf 0 and vf 1.

      Also,customer don't want to provide fixed mac address in pod annotation as
      a workaround,for now redeploying the affected vf's seems to help,but issue is intermittent and takes multiple deployment/redeployment to occur.

      This seems somewhat related to bug reported in
      https://issues.redhat.com/browse/FD-2815

      Regards,"""

      """
      Hi team,

      The issue is reproduced by the customer on one of the ocp cluster after upgrade to ocp 4.14(uses RHEL 9.2 packages).

      From the provided logs by the customer, Below is the observation:

      testpmd pod creation time: Fri, 30 Aug 2024 14:17:20 UTC

      sriov device pod logs(from ocp side) when the application pod (testpmd) gets created:

      sriov device logs:
      2024-08-30T14:17:20.416187036Z I0830 14:17:20.416090 1 server.go:120] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest

      {DevicesIDs:[0000:3b:02.0],}

      ,},}
      2024-08-30T14:17:20.416187036Z I0830 14:17:20.416163 1 netResourcePool.go:48] GetDeviceSpecs(): for devices: [0000:3b:02.0]
      2024-08-30T14:17:20.416274305Z I0830 14:17:20.416256 1 pool_stub.go:104] GetEnvs(): for devices: [0000:3b:02.0]
      2024-08-30T14:17:20.416300894Z I0830 14:17:20.416285 1 pool_stub.go:137] GetMounts(): for devices: [0000:3b:02.0]
      2024-08-30T14:17:20.416324288Z I0830 14:17:20.416292 1 server.go:136] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_A_ONERAN_01: 0000:3b:02.0,PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_A_ONERAN_01_INFO: {"0000:3b:02.0":{"generic":

      {"deviceID":"0000:3b:02.0"}

      ,"vfio":

      {"dev-mount":"/dev/vfio/156","mount":"/dev/vfio/vfio"}

      }},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec

      {ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:rw,}

      ,&DeviceSpec

      {ContainerPath:/dev/vfio/156,HostPath:/dev/vfio/156,Permissions:rw,}

      ,},Annotations:map[string]string{},CDIDevices:[]*CDIDevice{},},},}

      2024-08-30T14:17:20.416823150Z I0830 14:17:20.416765 1 server.go:120] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest

      {DevicesIDs:[0000:3b:02.1],}

      ,},}
      2024-08-30T14:17:20.416823150Z I0830 14:17:20.416807 1 netResourcePool.go:48] GetDeviceSpecs(): for devices: [0000:3b:02.1]
      2024-08-30T14:17:20.416925425Z I0830 14:17:20.416912 1 pool_stub.go:104] GetEnvs(): for devices: [0000:3b:02.1]
      2024-08-30T14:17:20.416959243Z I0830 14:17:20.416939 1 pool_stub.go:137] GetMounts(): for devices: [0000:3b:02.1]
      2024-08-30T14:17:20.416976735Z I0830 14:17:20.416945 1 server.go:136] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_B_ONERAN_01: 0000:3b:02.1,PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_B_ONERAN_01_INFO: {"0000:3b:02.1":{"generic":

      {"deviceID":"0000:3b:02.1"}

      ,"vfio":

      {"dev-mount":"/dev/vfio/157","mount":"/dev/vfio/vfio"}

      }},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec

      {ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:rw,}

      ,&DeviceSpec

      {ContainerPath:/dev/vfio/157,HostPath:/dev/vfio/157,Permissions:rw,}

      ,},Annotations:map[string]string{},CDIDevices:[]*CDIDevice{},},},}

      DPDK application/testpmd is started inside the pod shortly after the pod creation i.e. around Fri, 30 Aug 2024 14:19:11 UTC

      The mac address gets changed for both the vf's after the testpmd was run:

      Before:
      5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
      link/ether 3c:fd:fe:db:66:10 brd ff:ff:ff:ff:ff:ff
      vf 0 link/ether 1e:ed:f0:f2:3a:4c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
      vf 1 link/ether fe:f1:ad:51:5d:94 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on

      After:
      5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
      link/ether 3c:fd:fe:db:66:10 brd ff:ff:ff:ff:ff:ff
      vf 0 link/ether 02:77:5f:9c:ab:ba brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
      vf 1 link/ether fe:02:2a:b0:28:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on

      From the logs we observed that

      The issue might have started after the node was rebooted : reboot system boot 5.14.0-284.66.1. Tue Aug 27 08:48 still running
      Like earlier, vf 0 and vf 1 (pf ens1f0) are the affected/reported one's where mac address changes with every dpdk/testpmd app start.
      sriov config daemon pod logs (from ocp side) for vf 0, vf1 and other vf's example vf4 are all same.
      Dmesg shows the difference:

      for vf 0:

      $ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.0

      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: [8086:154c] type 00 class 0x020000
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: enabling Extended Tags
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: Adding to iommu group 156
      [Tue Aug 27 08:53:34 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002) --> this will be logged each time the dpdk app starts.
      [Tue Aug 27 08:53:48 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Tue Aug 27 12:01:28 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Tue Aug 27 12:02:08 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 06:07:43 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 06:08:20 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 07:06:26 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 07:07:04 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 07:21:57 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 07:22:39 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Wed Aug 28 11:46:29 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      -cutting down-
      [Thu Aug 29 06:58:01 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
      [Fri Aug 30 14:19:11 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002) --> when testpmd starts for the time mentioned

      for vf 1:

      $ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.1

      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: enabling Extended Tags
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: Adding to iommu group 157
      [Wed Aug 28 14:25:10 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:25:47 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:35:40 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:36:16 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:53:20 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:53:55 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:58:55 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 14:59:31 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 15:04:58 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Wed Aug 28 15:05:34 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Thu Aug 29 05:38:21 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Thu Aug 29 05:38:59 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Thu Aug 29 06:57:25 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Thu Aug 29 06:58:02 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
      [Fri Aug 30 14:19:11 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002) --> when testpmd starts for the time mentioned.

      for non affected vf example vf 4:

      $ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.4

      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: [8086:154c] type 00 class 0x020000
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: enabling Extended Tags
      [Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: Adding to iommu group 160
      [Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: enabling device (0000 -> 0002)
      [Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: Invalid MAC address 00:00:00:00:00:00, using random
      [Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: Multiqueue Enabled: Queue pair count = 4
      [Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: MAC address: 12:6c:9b:75:01:2e --> iavf sets the mac address during vf initialization.
      [Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: GRO is enabled
      [Tue Aug 27 08:50:58 2024] iavf 0000:3b:02.4 ens1f0v4: renamed from eth2
      [Tue Aug 27 08:51:01 2024] iavf 0000:3b:02.4: Reset indication received from the PF
      [Tue Aug 27 08:51:01 2024] iavf 0000:3b:02.4: Scheduling reset task
      [Tue Aug 27 08:51:02 2024] iavf 0000:3b:02.4: Removing device

      Here there is no "vfio-pci enabling device logs" present because still no dpdk app started using this vf,however we can see that mac address was set during vf initialization by the iavf but same is not done for vf 0 and vf 1.

      In my lab as well I couldn't reproduce the issue since iavf is setting the mac address during vf initialization(after node reboot) :

      sh-5.1# dmesg -T|grep -i 0000:3b:0b.3

      [Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: [8086:154c] type 00 class 0x020000
      [Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: enabling Extended Tags
      [Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: Adding to iommu group 143
      [Sun Sep 1 11:10:53 2024] iavf 0000:3b:0b.3: enabling device (0000 -> 0002)
      [Sun Sep 1 11:10:53 2024] iavf 0000:3b:0b.3: Device is still in reset (-16), retrying
      [Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: Invalid MAC address 00:00:00:00:00:00, using random
      [Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: Multiqueue Enabled: Queue pair count = 4
      [Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: MAC address: 92:c5:d4:9a:e3:ee --> here the mac address is set by iavf.
      [Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: GRO is enabled
      [Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3 ens1f1v11: renamed from eth14
      [Sun Sep 1 11:10:55 2024] iavf 0000:3b:0b.3 ens1f1v11: NIC Link is Up Speed is 10 Gbps Full Duplex
      [Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Reset indication received from the PF
      [Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Scheduling reset task
      [Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Removing device
      [Mon Sep 2 06:53:53 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002) //each time testpmd/dpdk application start this is logged.
      [Mon Sep 2 06:56:16 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
      [Mon Sep 2 06:57:45 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
      [Mon Sep 2 07:00:14 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
      [Mon Sep 2 07:00:54 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
      [Mon Sep 2 07:03:19 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)

      Logs: https://drive.google.com/drive/folders/1OMmbG2SNg39qS9zuayp90Nza0ICwv9Lt

      sriovnetworknodepolicy object created on ocp side example for vf0:

      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
      creationTimestamp: "2023-12-19T19:58:26Z"
      generation: 1
      name: pci-sriov-tn-a-oneran-01
      namespace: openshift-sriov-network-operator
      resourceVersion: "30161"
      uid: f8598ea0-c5d9-45fb-8d2b-cd7198c03a4d
      spec:
      deviceType: vfio-pci
      isRdma: false
      mtu: 9000
      nicSelector:
      deviceID: 158b
      pfNames:

      • ens1f0#0-0
        vendor: "8086"
        nodeSelector:
        node-role.kubernetes.io/master: ""
        numVfs: 32
        priority: 99
        resourceName: pci_sriov_tn_a_oneran_01

      Theory: If iavf is not setting the mac address after vf creation(during sriovnetworknodepolicy is deployed or when node reboots), later when testpmd pod uses the same vf and dpdk application starts, the mac address is changed each time by vfio-pci when dpdk application is started.The same is not observed in case the iavf sets the mac address. issue vf's: vf 0 and vf 1(pf ens1f0).
      """

              rhn-support-dmarchan David Marchand
              rhn-support-dmarchan David Marchand
              ovsdpdk triage ovsdpdk triage
              Jiying Qiu Jiying Qiu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: