-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
None
Description of problem: On reboot intermittently ovs will get a permission denied when try to open VFIO container.
~~~
2024-06-06T05:35:39.857Z|00013|dpdk|INFO|EAL ARGS: ovs-vswitchd -n 4 --socket-mem 4096,4096 --socket-limit 4096,4096 -l 0.
2024-06-06T05:35:39.861Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T05:35:39.861Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T05:35:39.861Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T05:35:39.862Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T05:35:39.884Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T05:35:39.884Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T05:35:39.884Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T05:35:39.885Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:35:39.885Z|00022|dpdk|ERR|EAL: cannot open VFIO container, error 13 (Permission denied)
2024-06-06T05:35:39.885Z|00023|dpdk|INFO|EAL: VFIO support could not be initialized
2024-06-06T05:35:41.005Z|00024|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:35:41.005Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:35:41.005Z|00026|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:35:41.005Z|00027|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:35:41.005Z|00028|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:35:41.007Z|00029|dpdk|INFO|DPDK Enabled - initialized
~~~
This cause the dpdk interfaces to fail to attach to the ovs bridge
~~~
Bridge br-ex1
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: netdev
Port dpdkbond3
Interface dpdk1
type: dpdk
options:
error: "Error attaching device '0000:aa:00.1' to DPDK"
Interface dpdk0
type: dpdk
options:
error: "Error attaching device '0000:aa:00.0' to DPDK"
Bridge br-ex2
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: netdev
Port br-ex2
Interface br-ex2
type: internal
Port dpdkbond4
Interface dpdk3
type: dpdk
options:
error: "Error attaching device '0000:ff:00.1' to DPDK"
Interface dpdk2
type: dpdk
options:
error: "Error attaching device '0000:ff:00.0' to DPDK"
~~~
After restarting ovs it will attach
~~~
2024-06-06T01:28:29.679Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T01:28:29.679Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T01:28:29.680Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T01:28:29.680Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T01:28:29.710Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T01:28:29.710Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T01:28:29.710Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T01:28:29.710Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T01:28:29.710Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T01:28:31.089Z|00023|dpdk|INFO|EAL: using IOMMU type 1 (Type 1)
2024-06-06T01:28:31.689Z|00024|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.0 (socket 0)
2024-06-06T01:28:32.186Z|00025|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.446Z|00026|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.1 (socket 0)
2024-06-06T01:28:32.520Z|00027|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.905Z|00028|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.0 (socket 1)
2024-06-06T01:28:33.291Z|00029|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.558Z|00030|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.1 (socket 1)
2024-06-06T01:28:33.618Z|00031|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.674Z|00032|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T01:28:33.677Z|00033|dpdk|INFO|DPDK Enabled - initialized
~~~
Version-Release number of selected component (if applicable):
openvswitch2.15-2.15.0-142.el8fdp.x86_64
rhosp-openvswitch-2.15-4.el8ost.1.noarch
How reproducible:
It it occuring about in about 1 out 5 reboots of the compute nodes
Actual results:
Sometime compute nodes start without networking for ovs-dpdk
Expected results:
compute nodes start with networking for ovs-dpdk
Additional info:
We tried this workaround but did not work.
https://access.redhat.com/solutions/4093751
https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/358322.html
Bugs was open for osp 10 but was closed with INSUFFICIENT_DATA
https://bugzilla.redhat.com/show_bug.cgi?id=1683817
Looking through the logs and comparing what is happening after ovs is restarted, I think we are hitting a race condition.
Even on the compute nodes that are working we can see permission errors, but they are on opening the devs not /dev/vifo/vifo.
~~~
2024-06-06T05:36:02.026Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:36:02.026Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T05:36:03.491Z|00023|dpdk|ERR|EAL: Cannot open /dev/vfio/34: Permission denied
2024-06-06T05:36:03.491Z|00024|dpdk|ERR|EAL: Failed to open group 34
2024-06-06T05:36:03.491Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:36:03.491Z|00026|dpdk|ERR|EAL: Cannot open /dev/vfio/35: Permission denied
2024-06-06T05:36:03.491Z|00027|dpdk|ERR|EAL: Failed to open group 35
2024-06-06T05:36:03.491Z|00028|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:36:03.491Z|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/173: Permission denied
2024-06-06T05:36:03.491Z|00030|dpdk|ERR|EAL: Failed to open group 173
2024-06-06T05:36:03.491Z|00031|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:36:03.491Z|00032|dpdk|ERR|EAL: Cannot open /dev/vfio/174: Permission denied
2024-06-06T05:36:03.491Z|00033|dpdk|ERR|EAL: Failed to open group 174
2024-06-06T05:36:03.491Z|00034|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:36:03.491Z|00035|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:36:03.494Z|00036|dpdk|INFO|DPDK Enabled - initialized
2024-06-06T05:36:03.498Z|00037|pmd_perf|INFO|DPDK provided TSC frequency: 2190000 KHz
~~~
- external trackers