Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12451

BZ#2290818 OVS-dpdk intermittently fails to attach dpdk interfaces on reboot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Normal Normal
    • rhos-16.2.z
    • None
    • os-net-config
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Moderate

      Description of problem: On reboot intermittently ovs will get a permission denied when try to open VFIO container.

      ~~~
      2024-06-06T05:35:39.857Z|00013|dpdk|INFO|EAL ARGS: ovs-vswitchd -n 4 --socket-mem 4096,4096 --socket-limit 4096,4096 -l 0.
      2024-06-06T05:35:39.861Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
      2024-06-06T05:35:39.861Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
      2024-06-06T05:35:39.861Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
      2024-06-06T05:35:39.862Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
      2024-06-06T05:35:39.884Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
      2024-06-06T05:35:39.884Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
      2024-06-06T05:35:39.884Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
      2024-06-06T05:35:39.885Z|00021|dpdk|INFO|EAL: Probing VFIO support...
      2024-06-06T05:35:39.885Z|00022|dpdk|ERR|EAL: cannot open VFIO container, error 13 (Permission denied)
      2024-06-06T05:35:39.885Z|00023|dpdk|INFO|EAL: VFIO support could not be initialized
      2024-06-06T05:35:41.005Z|00024|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
      2024-06-06T05:35:41.005Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
      2024-06-06T05:35:41.005Z|00026|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
      2024-06-06T05:35:41.005Z|00027|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
      2024-06-06T05:35:41.005Z|00028|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
      2024-06-06T05:35:41.007Z|00029|dpdk|INFO|DPDK Enabled - initialized
      ~~~

      This cause the dpdk interfaces to fail to attach to the ovs bridge

      ~~~
      Bridge br-ex1
      Controller "tcp:127.0.0.1:6633"
      is_connected: true
      fail_mode: secure
      datapath_type: netdev
      Port dpdkbond3
      Interface dpdk1
      type: dpdk
      options:

      {dpdk-devargs="0000:aa:00.1", n_rxq="2"}

      error: "Error attaching device '0000:aa:00.1' to DPDK"
      Interface dpdk0
      type: dpdk
      options:

      {dpdk-devargs="0000:aa:00.0", n_rxq="2"}

      error: "Error attaching device '0000:aa:00.0' to DPDK"
      Bridge br-ex2
      Controller "tcp:127.0.0.1:6633"
      is_connected: true
      fail_mode: secure
      datapath_type: netdev
      Port br-ex2
      Interface br-ex2
      type: internal
      Port dpdkbond4
      Interface dpdk3
      type: dpdk
      options:

      {dpdk-devargs="0000:ff:00.1", n_rxq="2"}

      error: "Error attaching device '0000:ff:00.1' to DPDK"
      Interface dpdk2
      type: dpdk
      options:

      {dpdk-devargs="0000:ff:00.0", n_rxq="2"}

      error: "Error attaching device '0000:ff:00.0' to DPDK"
      ~~~

      After restarting ovs it will attach

      ~~~
      2024-06-06T01:28:29.679Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
      2024-06-06T01:28:29.679Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
      2024-06-06T01:28:29.680Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
      2024-06-06T01:28:29.680Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
      2024-06-06T01:28:29.710Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
      2024-06-06T01:28:29.710Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
      2024-06-06T01:28:29.710Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
      2024-06-06T01:28:29.710Z|00021|dpdk|INFO|EAL: Probing VFIO support...
      2024-06-06T01:28:29.710Z|00022|dpdk|INFO|EAL: VFIO support initialized
      2024-06-06T01:28:31.089Z|00023|dpdk|INFO|EAL: using IOMMU type 1 (Type 1)
      2024-06-06T01:28:31.689Z|00024|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.0 (socket 0)
      2024-06-06T01:28:32.186Z|00025|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
      2024-06-06T01:28:32.446Z|00026|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.1 (socket 0)
      2024-06-06T01:28:32.520Z|00027|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
      2024-06-06T01:28:32.905Z|00028|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.0 (socket 1)
      2024-06-06T01:28:33.291Z|00029|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
      2024-06-06T01:28:33.558Z|00030|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.1 (socket 1)
      2024-06-06T01:28:33.618Z|00031|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
      2024-06-06T01:28:33.674Z|00032|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
      2024-06-06T01:28:33.677Z|00033|dpdk|INFO|DPDK Enabled - initialized
      ~~~

      Version-Release number of selected component (if applicable):

      openvswitch2.15-2.15.0-142.el8fdp.x86_64
      rhosp-openvswitch-2.15-4.el8ost.1.noarch

      How reproducible:

      It it occuring about in about 1 out 5 reboots of the compute nodes

      Actual results:

      Sometime compute nodes start without networking for ovs-dpdk

      Expected results:

      compute nodes start with networking for ovs-dpdk

      Additional info:

      We tried this workaround but did not work.
      https://access.redhat.com/solutions/4093751
      https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/358322.html

      Bugs was open for osp 10 but was closed with INSUFFICIENT_DATA
      https://bugzilla.redhat.com/show_bug.cgi?id=1683817

      Looking through the logs and comparing what is happening after ovs is restarted, I think we are hitting a race condition.

      Even on the compute nodes that are working we can see permission errors, but they are on opening the devs not /dev/vifo/vifo.
      ~~~
      2024-06-06T05:36:02.026Z|00021|dpdk|INFO|EAL: Probing VFIO support...
      2024-06-06T05:36:02.026Z|00022|dpdk|INFO|EAL: VFIO support initialized
      2024-06-06T05:36:03.491Z|00023|dpdk|ERR|EAL: Cannot open /dev/vfio/34: Permission denied
      2024-06-06T05:36:03.491Z|00024|dpdk|ERR|EAL: Failed to open group 34
      2024-06-06T05:36:03.491Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
      2024-06-06T05:36:03.491Z|00026|dpdk|ERR|EAL: Cannot open /dev/vfio/35: Permission denied
      2024-06-06T05:36:03.491Z|00027|dpdk|ERR|EAL: Failed to open group 35
      2024-06-06T05:36:03.491Z|00028|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
      2024-06-06T05:36:03.491Z|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/173: Permission denied
      2024-06-06T05:36:03.491Z|00030|dpdk|ERR|EAL: Failed to open group 173
      2024-06-06T05:36:03.491Z|00031|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
      2024-06-06T05:36:03.491Z|00032|dpdk|ERR|EAL: Cannot open /dev/vfio/174: Permission denied
      2024-06-06T05:36:03.491Z|00033|dpdk|ERR|EAL: Failed to open group 174
      2024-06-06T05:36:03.491Z|00034|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
      2024-06-06T05:36:03.491Z|00035|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
      2024-06-06T05:36:03.494Z|00036|dpdk|INFO|DPDK Enabled - initialized
      2024-06-06T05:36:03.498Z|00037|pmd_perf|INFO|DPDK provided TSC frequency: 2190000 KHz
      ~~~

              jira-bugzilla-migration RH Bugzilla Integration
              jira-bugzilla-migration RH Bugzilla Integration
              Eran Kuris Eran Kuris
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: