Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-8734

nova.exception.DeviceBusy after redeploying the daplane

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • None
    • nova-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • Moderate

      Sometimes we see this exception after redeploying the dataplane:

      [cloud-admin@compute-1 ~]$ sudo podman ps |grep nova
      36d71236d88c  images.paas.redhat.com/podified-rhos18-rhel9/openstack-nova-compute:current-podified                                                   kolla_start           About an hour ago  Up 6 seconds                              nova_compute <-- constantly respawned

      In the nova_compute logs we can see this:

      Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
      2024-07-17 09:27:42.978 2 DEBUG oslo_service.service [None req-03dee2cd-1d6a-4dae-8632-b9a34bc3590b - - - - - -] logging_exception_prefix       = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/pyth
      on3.9/site-packages/oslo_config/cfg.py:2602                                                                            
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service [None req-10d29fb0-5b27-4084-920e-02e99e45e826 - - - - - -] Error starting thread.: nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy.
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last):
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 73, in read_sys
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     return data.read()          
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service OSError: [Errno 16] Device or resource busy
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service 
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service The above exception was the direct cause of the following exception:
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service    
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last):              
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 806, in run_service
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     service.start()
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/service.py", line 162, in start
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     self.manager.init_host(self.service_ref)
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 1608, in init_host
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     self.driver.init_host(host=self.host)
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 834, in init_host
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     self.cpu_api.validate_all_dedicated_cpus()
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 186, in validate_all_dedicated_cpus
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     governors.add(pcpu.governor)
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 65, in governor
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     return core.get_governor(self.ident)
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/core.py", line 80, in get_governor
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     return filesystem.read_sys(
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 43, in wrapper
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     return func(*args, **kwargs)
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 77, in read_sys
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service     raise exception.DeviceBusy(file_path=path) from exc
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy.
      2024-07-17 09:27:53.372 2 ERROR oslo_service.service       

      After rebooting the compute we can see the nova_compute service works normally.

      This is the SR-IOV and NIC config we have:

       

              edpm_network_config_template: |
                ---
                {% set mtu_list = [ctlplane_mtu] %}
                {% for network in nodeset_networks %}
                {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
                {%- endfor %}
                {% set min_viable_mtu = mtu_list | max %}
                network_config:
                - type: interface
                  name: nic1
                  use_dhcp: false
                - type: interface
                  name: nic2
                  use_dhcp: true
                - type: linux_bond
                  name: bond_api
                  use_dhcp: false
                  bonding_options: "mode=active-backup"
                  dns_servers: {{ ctlplane_dns_nameservers }}
                  members:
                    - type: interface
                      name: nic3
                      primary: true
                    - type: interface
                      name: nic4
                  addresses:
                  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
                  routes:
                  - default: true
                    next_hop: {{ ctlplane_gateway_ip }}
                - type: vlan
                  vlan_id: {{ lookup('vars', networks_lower['internalapi'] ~ '_vlan_id') }}
                  device: bond_api
                  addresses:
                  - ip_netmask: {{ lookup('vars', networks_lower['internalapi'] ~ '_ip') }}/{{ lookup('vars', networks_lower['internalapi'] ~ '_cidr') }}
                - type: vlan
                  vlan_id: {{ lookup('vars', networks_lower['storage'] ~ '_vlan_id') }}
                  device: bond_api
                  addresses:
                  - ip_netmask: {{ lookup('vars', networks_lower['storage'] ~ '_ip') }}/{{ lookup('vars', networks_lower['storage'] ~ '_cidr') }}
                - type: sriov_pf
                  name: nic6
                  mtu: 9000
                  numvfs: 5
                  use_dhcp: false
                  defroute: false
                  nm_controlled: true
                  hotplug: true
                  promisc: false
                - type: sriov_pf
                  name: nic7
                  mtu: 9000
                  numvfs: 5
                  use_dhcp: false
                  defroute: false
                  nm_controlled: true
                  hotplug: true
                  promisc: false
                - type: ovs_user_bridge
                  name: br-link0
                  use_dhcp: false
                  ovs_extra: "set port br-link0 tag={{ lookup('vars', networks_lower['tenant'] ~ '_vlan_id') }}"
                  addresses:
                  - ip_netmask: {{ lookup('vars', networks_lower['tenant'] ~ '_ip') }}/{{ lookup('vars', networks_lower['tenant'] ~ '_cidr')}}
                  members:
                    - type: ovs_dpdk_bond
                      name: dpdkbond0
                      mtu: 9000
                      rx_queue: 2
                      members:
                      - type: ovs_dpdk_port
                        driver: mlx5_core
                        name: dpdk0
                        members:
                        - type: sriov_vf
                          device: nic6
                          vfid: 0
                      - type: ovs_dpdk_port
                        driver: mlx5_core
                        name: dpdk1
                        members:
                        - type: sriov_vf
                          device: nic7
                          vfid: 0
                - type: ovs_user_bridge
                  name: br-link1
                  use_dhcp: false
                  members:
                    - type: ovs_dpdk_port
                      name: dpdk2
                      mtu: 9000
                      rx_queue: 1
                      members:
                        - type: interface
                          name: nic5
              edpm_neutron_sriov_agent_SRIOV_NIC_physical_device_mappings: sriov1:ens2f0np0,sriov2:ens2f1np1
      data:
        03-sriov-nova.conf: |
          [pci]
          device_spec = {"address": "0000:17:00.0", "physical_network":"sriov1", "trusted":"true"}
          device_spec = {"address": "0000:17:00.1", "physical_network":"sriov2", "trusted":"true"}
      

      Maybe the issue is due we are not excluding the PCI addresses we use in the tenant bridge from device_spec.

       

              Unassigned Unassigned
              rdiazcam@redhat.com Ricardo Diaz Campos
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: