As we are closing the FD product, this is a clone of the original report at https://issues.redhat.com/browse/FD-3273.
"""
Hello team,
Hope you are doing well !
One of the Customer is facing below issue on their single node ocp clusters:
Whenever testpmd or another DPDK application is started in a pod, the attached VFs' MAC addresses gets changed. This does not happen for all the vf’s but kind of intermittent issue and it goes away when redeploying the sriovnetworknodepolicy for the affected vf’s(this will redeploy the vf).
Example:
vf 0 link/ether 16:8f:9a:b2:a6:a4 brd ff:ff:ff:ff:ff:ff, spoof checking
on, link-state auto, trust off
vf 1 link/ether 42:89:5a:b2:d0:23 brd ff:ff:ff:ff:ff:ff, spoof checking
on, link-state auto, trust off
create the pod using the above vf's without starting any application:
$ oc get po
NAME READY STATUS RESTARTS AGE
testpmd 1/1 Running 0 5s
vf 0 link/ether 16:8f:9a:b2:a6:a4 brd ff:ff:ff:ff:ff:ff, spoof checking
off, link-state auto, trust off
vf 1 link/ether 42:89:5a:b2:d0:23 brd ff:ff:ff:ff:ff:ff, spoof checking
off, link-state auto, trust off
No mac address change until now.
start the dpdk test-pmd:
oc exec testpmd – testpmd -l 4,40 -a "0000:3b:02.0" -a
"0000:3b:02.1" – --nb-cores=1 --forward-mode=mac
--rxd=2048 --txd=2048 -i --eth-peer=0,"50:00:00:03" --eth-peer=1,"50:00:00:04"
cutting down
Port 0: link state change event
Port 0: link state change event
iavf_set_rx_function(): request RXDID[1] in Queue[0] is legacy, set rx_pkt_burst as legacy for all queues
Port 0: AA:24:55:24:24:0A
Configuring Port 1 (socket 0)
Port 1: link state change event
Port 1: link state change event
Port 1: 6A:E1:8D:26:0F:49
Checking link statuses...
testpmd> Done
vf 0 link/ether aa:24:55:24:24:0a brd
ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off
vf 1 link/ether 6a:e1:8d:26:0f:49 brd ff:ff:ff:ff:ff:ff, spoof checking
off, link-state auto, trust off
change in mac address can be seen as above.
Additional info:
working vf's where no mac address change seen(example vf 3 and vf 4):
cat dmesg|grep -i 0000:3b:02.3|grep -i mac
[192.326447] iavf 0000:3b:02.3: Invalid MAC address 00:00:00:00:00:00,
using random [ 192.332807] iavf 0000:3b:02.3: MAC address: 2a:1a:82:36:cb:f9
cat dmesg|grep -i 0000:3b:02.4|grep -i mac
[192.330453] iavf 0000:3b:02.4: Invalid MAC address 00:00:00:00:00:00,
using random [ 192.332817] iavf 0000:3b:02.4: MAC address: 1e:e1:d8:ab:09:26
Non-working(vf 0 and vf 1):no logs regarding mac set
cat dmesg|grep -i 0000:3b:02.0|grep -i mac
cat dmesg|grep -i 0000:3b:02.1|grep -i mac
Note:
they are using vfio-pci driver for above vf’s.
$ oc logs sriov-network-config-daemon-282tq -n
openshift-sriov-network-operator|grep -e 0000:3b:02.0 -e 0000:3b:02.1
-e 0000:3b:02.3 -e 0000:3b:02.4
The vf's belong to pf ens1f0:
cat ethtool_-i_ens1f0
driver: i40e
version: 5.14.0-284.45.1.rt14.330.el9_2.
firmware-version: 9.20 0x8000d960 22.0.9
Note: same issue seen on multiple sno cluster.
we also tested with latest dpdk from upstream(23.11) and behavior is same
when using the mentioned vf 0 and vf 1.
Also,customer don't want to provide fixed mac address in pod annotation as
a workaround,for now redeploying the affected vf's seems to help,but issue is intermittent and takes multiple deployment/redeployment to occur.
This seems somewhat related to bug reported in
https://issues.redhat.com/browse/FD-2815
Regards,"""
"""
Hi team,
The issue is reproduced by the customer on one of the ocp cluster after upgrade to ocp 4.14(uses RHEL 9.2 packages).
From the provided logs by the customer, Below is the observation:
testpmd pod creation time: Fri, 30 Aug 2024 14:17:20 UTC
sriov device pod logs(from ocp side) when the application pod (testpmd) gets created:
sriov device logs:
2024-08-30T14:17:20.416187036Z I0830 14:17:20.416090 1 server.go:120] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest
,},}
2024-08-30T14:17:20.416187036Z I0830 14:17:20.416163 1 netResourcePool.go:48] GetDeviceSpecs(): for devices: [0000:3b:02.0]
2024-08-30T14:17:20.416274305Z I0830 14:17:20.416256 1 pool_stub.go:104] GetEnvs(): for devices: [0000:3b:02.0]
2024-08-30T14:17:20.416300894Z I0830 14:17:20.416285 1 pool_stub.go:137] GetMounts(): for devices: [0000:3b:02.0]
2024-08-30T14:17:20.416324288Z I0830 14:17:20.416292 1 server.go:136] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_A_ONERAN_01: 0000:3b:02.0,PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_A_ONERAN_01_INFO: {"0000:3b:02.0":{"generic":
,"vfio":
{"dev-mount":"/dev/vfio/156","mount":"/dev/vfio/vfio"}}},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec
{ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:rw,},&DeviceSpec
{ContainerPath:/dev/vfio/156,HostPath:/dev/vfio/156,Permissions:rw,},},Annotations:map[string]string{},CDIDevices:[]*CDIDevice{},},},}
2024-08-30T14:17:20.416823150Z I0830 14:17:20.416765 1 server.go:120] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest
{DevicesIDs:[0000:3b:02.1],},},}
2024-08-30T14:17:20.416823150Z I0830 14:17:20.416807 1 netResourcePool.go:48] GetDeviceSpecs(): for devices: [0000:3b:02.1]
2024-08-30T14:17:20.416925425Z I0830 14:17:20.416912 1 pool_stub.go:104] GetEnvs(): for devices: [0000:3b:02.1]
2024-08-30T14:17:20.416959243Z I0830 14:17:20.416939 1 pool_stub.go:137] GetMounts(): for devices: [0000:3b:02.1]
2024-08-30T14:17:20.416976735Z I0830 14:17:20.416945 1 server.go:136] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_B_ONERAN_01: 0000:3b:02.1,PCIDEVICE_OPENSHIFT_IO_PCI_SRIOV_TN_B_ONERAN_01_INFO: {"0000:3b:02.1":{"generic":
,"vfio":
{"dev-mount":"/dev/vfio/157","mount":"/dev/vfio/vfio"}}},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{&DeviceSpec
{ContainerPath:/dev/vfio/vfio,HostPath:/dev/vfio/vfio,Permissions:rw,},&DeviceSpec
{ContainerPath:/dev/vfio/157,HostPath:/dev/vfio/157,Permissions:rw,},},Annotations:map[string]string{},CDIDevices:[]*CDIDevice{},},},}
DPDK application/testpmd is started inside the pod shortly after the pod creation i.e. around Fri, 30 Aug 2024 14:19:11 UTC
The mac address gets changed for both the vf's after the testpmd was run:
Before:
5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:db:66:10 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 1e:ed:f0:f2:3a:4c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
vf 1 link/ether fe:f1:ad:51:5d:94 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
After:
5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:db:66:10 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 02:77:5f:9c:ab:ba brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
vf 1 link/ether fe:02:2a:b0:28:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on
From the logs we observed that
The issue might have started after the node was rebooted : reboot system boot 5.14.0-284.66.1. Tue Aug 27 08:48 still running
Like earlier, vf 0 and vf 1 (pf ens1f0) are the affected/reported one's where mac address changes with every dpdk/testpmd app start.
sriov config daemon pod logs (from ocp side) for vf 0, vf1 and other vf's example vf4 are all same.
Dmesg shows the difference:
for vf 0:
$ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.0
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: [8086:154c] type 00 class 0x020000
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: enabling Extended Tags
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.0: Adding to iommu group 156
[Tue Aug 27 08:53:34 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002) --> this will be logged each time the dpdk app starts.
[Tue Aug 27 08:53:48 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Tue Aug 27 12:01:28 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Tue Aug 27 12:02:08 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 06:07:43 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 06:08:20 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 07:06:26 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 07:07:04 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 07:21:57 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 07:22:39 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Wed Aug 28 11:46:29 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
-cutting down-
[Thu Aug 29 06:58:01 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002)
[Fri Aug 30 14:19:11 2024] vfio-pci 0000:3b:02.0: enabling device (0000 -> 0002) --> when testpmd starts for the time mentioned
for vf 1:
$ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.1
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: enabling Extended Tags
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.1: Adding to iommu group 157
[Wed Aug 28 14:25:10 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:25:47 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:35:40 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:36:16 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:53:20 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:53:55 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:58:55 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 14:59:31 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 15:04:58 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Wed Aug 28 15:05:34 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Thu Aug 29 05:38:21 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Thu Aug 29 05:38:59 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Thu Aug 29 06:57:25 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Thu Aug 29 06:58:02 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002)
[Fri Aug 30 14:19:11 2024] vfio-pci 0000:3b:02.1: enabling device (0000 -> 0002) --> when testpmd starts for the time mentioned.
for non affected vf example vf 4:
$ cat 0120-dmesg.1208.log |grep -i 0000:3b:02.4
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: [8086:154c] type 00 class 0x020000
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: enabling Extended Tags
[Tue Aug 27 08:50:53 2024] pci 0000:3b:02.4: Adding to iommu group 160
[Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: enabling device (0000 -> 0002)
[Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: Invalid MAC address 00:00:00:00:00:00, using random
[Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: Multiqueue Enabled: Queue pair count = 4
[Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: MAC address: 12:6c:9b:75:01:2e --> iavf sets the mac address during vf initialization.
[Tue Aug 27 08:50:56 2024] iavf 0000:3b:02.4: GRO is enabled
[Tue Aug 27 08:50:58 2024] iavf 0000:3b:02.4 ens1f0v4: renamed from eth2
[Tue Aug 27 08:51:01 2024] iavf 0000:3b:02.4: Reset indication received from the PF
[Tue Aug 27 08:51:01 2024] iavf 0000:3b:02.4: Scheduling reset task
[Tue Aug 27 08:51:02 2024] iavf 0000:3b:02.4: Removing device
Here there is no "vfio-pci enabling device logs" present because still no dpdk app started using this vf,however we can see that mac address was set during vf initialization by the iavf but same is not done for vf 0 and vf 1.
In my lab as well I couldn't reproduce the issue since iavf is setting the mac address during vf initialization(after node reboot) :
sh-5.1# dmesg -T|grep -i 0000:3b:0b.3
[Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: [8086:154c] type 00 class 0x020000
[Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: enabling Extended Tags
[Sun Sep 1 11:10:53 2024] pci 0000:3b:0b.3: Adding to iommu group 143
[Sun Sep 1 11:10:53 2024] iavf 0000:3b:0b.3: enabling device (0000 -> 0002)
[Sun Sep 1 11:10:53 2024] iavf 0000:3b:0b.3: Device is still in reset (-16), retrying
[Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: Invalid MAC address 00:00:00:00:00:00, using random
[Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: Multiqueue Enabled: Queue pair count = 4
[Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: MAC address: 92:c5:d4:9a:e3:ee --> here the mac address is set by iavf.
[Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3: GRO is enabled
[Sun Sep 1 11:10:54 2024] iavf 0000:3b:0b.3 ens1f1v11: renamed from eth14
[Sun Sep 1 11:10:55 2024] iavf 0000:3b:0b.3 ens1f1v11: NIC Link is Up Speed is 10 Gbps Full Duplex
[Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Reset indication received from the PF
[Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Scheduling reset task
[Sun Sep 1 11:10:56 2024] iavf 0000:3b:0b.3: Removing device
[Mon Sep 2 06:53:53 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002) //each time testpmd/dpdk application start this is logged.
[Mon Sep 2 06:56:16 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
[Mon Sep 2 06:57:45 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
[Mon Sep 2 07:00:14 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
[Mon Sep 2 07:00:54 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
[Mon Sep 2 07:03:19 2024] vfio-pci 0000:3b:0b.3: enabling device (0000 -> 0002)
Logs: https://drive.google.com/drive/folders/1OMmbG2SNg39qS9zuayp90Nza0ICwv9Lt
sriovnetworknodepolicy object created on ocp side example for vf0:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
creationTimestamp: "2023-12-19T19:58:26Z"
generation: 1
name: pci-sriov-tn-a-oneran-01
namespace: openshift-sriov-network-operator
resourceVersion: "30161"
uid: f8598ea0-c5d9-45fb-8d2b-cd7198c03a4d
spec:
deviceType: vfio-pci
isRdma: false
mtu: 9000
nicSelector:
deviceID: 158b
pfNames:
- ens1f0#0-0
vendor: "8086"
nodeSelector:
node-role.kubernetes.io/master: ""
numVfs: 32
priority: 99
resourceName: pci_sriov_tn_a_oneran_01
Theory: If iavf is not setting the mac address after vf creation(during sriovnetworknodepolicy is deployed or when node reboots), later when testpmd pod uses the same vf and dpdk application starts, the mac address is changed each time by vfio-pci when dpdk application is started.The same is not observed in case the iavf sets the mac address. issue vf's: vf 0 and vf 1(pf ens1f0).
"""