-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
?
-
None
-
-
-
Moderate
Sometimes we see this exception after redeploying the dataplane:
[cloud-admin@compute-1 ~]$ sudo podman ps |grep nova 36d71236d88c images.paas.redhat.com/podified-rhos18-rhel9/openstack-nova-compute:current-podified kolla_start About an hour ago Up 6 seconds nova_compute <-- constantly respawned
In the nova_compute logs we can see this:
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code. 2024-07-17 09:27:42.978 2 DEBUG oslo_service.service [None req-03dee2cd-1d6a-4dae-8632-b9a34bc3590b - - - - - -] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/pyth on3.9/site-packages/oslo_config/cfg.py:2602 2024-07-17 09:27:53.372 2 ERROR oslo_service.service [None req-10d29fb0-5b27-4084-920e-02e99e45e826 - - - - - -] Error starting thread.: nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy. 2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last): 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 73, in read_sys 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return data.read() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service OSError: [Errno 16] Device or resource busy 2024-07-17 09:27:53.372 2 ERROR oslo_service.service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service The above exception was the direct cause of the following exception: 2024-07-17 09:27:53.372 2 ERROR oslo_service.service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last): 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 806, in run_service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service service.start() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/service.py", line 162, in start 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.manager.init_host(self.service_ref) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 1608, in init_host 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.driver.init_host(host=self.host) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 834, in init_host 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.cpu_api.validate_all_dedicated_cpus() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 186, in validate_all_dedicated_cpus 2024-07-17 09:27:53.372 2 ERROR oslo_service.service governors.add(pcpu.governor) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 65, in governor 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return core.get_governor(self.ident) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/core.py", line 80, in get_governor 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return filesystem.read_sys( 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 43, in wrapper 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return func(*args, **kwargs) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 77, in read_sys 2024-07-17 09:27:53.372 2 ERROR oslo_service.service raise exception.DeviceBusy(file_path=path) from exc 2024-07-17 09:27:53.372 2 ERROR oslo_service.service nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy. 2024-07-17 09:27:53.372 2 ERROR oslo_service.service
After rebooting the compute we can see the nova_compute service works normally.
This is the SR-IOV and NIC config we have:
edpm_network_config_template: |
---
{% set mtu_list = [ctlplane_mtu] %}
{% for network in nodeset_networks %}
{{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
{%- endfor %}
{% set min_viable_mtu = mtu_list | max %}
network_config:
- type: interface
name: nic1
use_dhcp: false
- type: interface
name: nic2
use_dhcp: true
- type: linux_bond
name: bond_api
use_dhcp: false
bonding_options: "mode=active-backup"
dns_servers: {{ ctlplane_dns_nameservers }}
members:
- type: interface
name: nic3
primary: true
- type: interface
name: nic4
addresses:
- ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
routes:
- default: true
next_hop: {{ ctlplane_gateway_ip }}
- type: vlan
vlan_id: {{ lookup('vars', networks_lower['internalapi'] ~ '_vlan_id') }}
device: bond_api
addresses:
- ip_netmask: {{ lookup('vars', networks_lower['internalapi'] ~ '_ip') }}/{{ lookup('vars', networks_lower['internalapi'] ~ '_cidr') }}
- type: vlan
vlan_id: {{ lookup('vars', networks_lower['storage'] ~ '_vlan_id') }}
device: bond_api
addresses:
- ip_netmask: {{ lookup('vars', networks_lower['storage'] ~ '_ip') }}/{{ lookup('vars', networks_lower['storage'] ~ '_cidr') }}
- type: sriov_pf
name: nic6
mtu: 9000
numvfs: 5
use_dhcp: false
defroute: false
nm_controlled: true
hotplug: true
promisc: false
- type: sriov_pf
name: nic7
mtu: 9000
numvfs: 5
use_dhcp: false
defroute: false
nm_controlled: true
hotplug: true
promisc: false
- type: ovs_user_bridge
name: br-link0
use_dhcp: false
ovs_extra: "set port br-link0 tag={{ lookup('vars', networks_lower['tenant'] ~ '_vlan_id') }}"
addresses:
- ip_netmask: {{ lookup('vars', networks_lower['tenant'] ~ '_ip') }}/{{ lookup('vars', networks_lower['tenant'] ~ '_cidr')}}
members:
- type: ovs_dpdk_bond
name: dpdkbond0
mtu: 9000
rx_queue: 2
members:
- type: ovs_dpdk_port
driver: mlx5_core
name: dpdk0
members:
- type: sriov_vf
device: nic6
vfid: 0
- type: ovs_dpdk_port
driver: mlx5_core
name: dpdk1
members:
- type: sriov_vf
device: nic7
vfid: 0
- type: ovs_user_bridge
name: br-link1
use_dhcp: false
members:
- type: ovs_dpdk_port
name: dpdk2
mtu: 9000
rx_queue: 1
members:
- type: interface
name: nic5
edpm_neutron_sriov_agent_SRIOV_NIC_physical_device_mappings: sriov1:ens2f0np0,sriov2:ens2f1np1
data:
03-sriov-nova.conf: |
[pci]
device_spec = {"address": "0000:17:00.0", "physical_network":"sriov1", "trusted":"true"}
device_spec = {"address": "0000:17:00.1", "physical_network":"sriov2", "trusted":"true"}
Maybe the issue is due we are not excluding the PCI addresses we use in the tenant bridge from device_spec.
- duplicates
-
OSPRH-8806 Restarting nova-compute fails if power management is enabled
-
- Closed
-