-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
None
-
-
-
Moderate
Sometimes we see this exception after redeploying the dataplane:
[cloud-admin@compute-1 ~]$ sudo podman ps |grep nova 36d71236d88c images.paas.redhat.com/podified-rhos18-rhel9/openstack-nova-compute:current-podified kolla_start About an hour ago Up 6 seconds nova_compute <-- constantly respawned
In the nova_compute logs we can see this:
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code. 2024-07-17 09:27:42.978 2 DEBUG oslo_service.service [None req-03dee2cd-1d6a-4dae-8632-b9a34bc3590b - - - - - -] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/pyth on3.9/site-packages/oslo_config/cfg.py:2602 2024-07-17 09:27:53.372 2 ERROR oslo_service.service [None req-10d29fb0-5b27-4084-920e-02e99e45e826 - - - - - -] Error starting thread.: nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy. 2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last): 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 73, in read_sys 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return data.read() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service OSError: [Errno 16] Device or resource busy 2024-07-17 09:27:53.372 2 ERROR oslo_service.service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service The above exception was the direct cause of the following exception: 2024-07-17 09:27:53.372 2 ERROR oslo_service.service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service Traceback (most recent call last): 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 806, in run_service 2024-07-17 09:27:53.372 2 ERROR oslo_service.service service.start() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/service.py", line 162, in start 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.manager.init_host(self.service_ref) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 1608, in init_host 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.driver.init_host(host=self.host) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 834, in init_host 2024-07-17 09:27:53.372 2 ERROR oslo_service.service self.cpu_api.validate_all_dedicated_cpus() 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 186, in validate_all_dedicated_cpus 2024-07-17 09:27:53.372 2 ERROR oslo_service.service governors.add(pcpu.governor) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 65, in governor 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return core.get_governor(self.ident) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/core.py", line 80, in get_governor 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return filesystem.read_sys( 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 43, in wrapper 2024-07-17 09:27:53.372 2 ERROR oslo_service.service return func(*args, **kwargs) 2024-07-17 09:27:53.372 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 77, in read_sys 2024-07-17 09:27:53.372 2 ERROR oslo_service.service raise exception.DeviceBusy(file_path=path) from exc 2024-07-17 09:27:53.372 2 ERROR oslo_service.service nova.exception.DeviceBusy: device /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor is busy. 2024-07-17 09:27:53.372 2 ERROR oslo_service.service
After rebooting the compute we can see the nova_compute service works normally.
This is the SR-IOV and NIC config we have:
edpm_network_config_template: | --- {% set mtu_list = [ctlplane_mtu] %} {% for network in nodeset_networks %} {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }} {%- endfor %} {% set min_viable_mtu = mtu_list | max %} network_config: - type: interface name: nic1 use_dhcp: false - type: interface name: nic2 use_dhcp: true - type: linux_bond name: bond_api use_dhcp: false bonding_options: "mode=active-backup" dns_servers: {{ ctlplane_dns_nameservers }} members: - type: interface name: nic3 primary: true - type: interface name: nic4 addresses: - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }} routes: - default: true next_hop: {{ ctlplane_gateway_ip }} - type: vlan vlan_id: {{ lookup('vars', networks_lower['internalapi'] ~ '_vlan_id') }} device: bond_api addresses: - ip_netmask: {{ lookup('vars', networks_lower['internalapi'] ~ '_ip') }}/{{ lookup('vars', networks_lower['internalapi'] ~ '_cidr') }} - type: vlan vlan_id: {{ lookup('vars', networks_lower['storage'] ~ '_vlan_id') }} device: bond_api addresses: - ip_netmask: {{ lookup('vars', networks_lower['storage'] ~ '_ip') }}/{{ lookup('vars', networks_lower['storage'] ~ '_cidr') }} - type: sriov_pf name: nic6 mtu: 9000 numvfs: 5 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false - type: sriov_pf name: nic7 mtu: 9000 numvfs: 5 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false - type: ovs_user_bridge name: br-link0 use_dhcp: false ovs_extra: "set port br-link0 tag={{ lookup('vars', networks_lower['tenant'] ~ '_vlan_id') }}" addresses: - ip_netmask: {{ lookup('vars', networks_lower['tenant'] ~ '_ip') }}/{{ lookup('vars', networks_lower['tenant'] ~ '_cidr')}} members: - type: ovs_dpdk_bond name: dpdkbond0 mtu: 9000 rx_queue: 2 members: - type: ovs_dpdk_port driver: mlx5_core name: dpdk0 members: - type: sriov_vf device: nic6 vfid: 0 - type: ovs_dpdk_port driver: mlx5_core name: dpdk1 members: - type: sriov_vf device: nic7 vfid: 0 - type: ovs_user_bridge name: br-link1 use_dhcp: false members: - type: ovs_dpdk_port name: dpdk2 mtu: 9000 rx_queue: 1 members: - type: interface name: nic5 edpm_neutron_sriov_agent_SRIOV_NIC_physical_device_mappings: sriov1:ens2f0np0,sriov2:ens2f1np1 data: 03-sriov-nova.conf: | [pci] device_spec = {"address": "0000:17:00.0", "physical_network":"sriov1", "trusted":"true"} device_spec = {"address": "0000:17:00.1", "physical_network":"sriov2", "trusted":"true"}
Maybe the issue is due we are not excluding the PCI addresses we use in the tenant bridge from device_spec.
- duplicates
-
OSPRH-8806 Restarting nova-compute fails if power management is enabled
- Closed