-
Bug
-
Resolution: Done-Errata
-
Blocker
-
rhos-18.0.0
-
None
-
1
-
False
-
-
False
-
?
-
No Docs Impact
-
openstack-nova-27.4.1-18.0.20240725124737.47428f6.el9osttrunk
-
?
-
?
-
None
-
-
-
Important
The nova-compute fails to start up on a physical hardware if power management is enabled, dedicated cpus are configured, and the nova-compute is restarted (initial start works).
When the strategy is cpu_state then:
- the nova-compute code reads the cpu state and governor values of the dedicated cpus at startup
- then offline all the unallocated dedicated cores (except cpu0)
- then when nova-compute is restarted
- nova tries to read the governor from an offlined cpus and that is not supported by the kernel on a physical core, so nova fails to start up.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L853-L859
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/cpu/api.py#L186
physical hardware
root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/online 1 root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor powersave root@bedrock:/home/gibi# echo 0 > /sys/devices/system/cpu/cpu4/online root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor cat: /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: Device or resource busy root@bedrock:/home/gibi#
We did not hit this when testing in with virtual computes as there scaling_governor file does exists and that is handled in nova: https://github.com/openstack/nova/blob/e82854dc8c514e457528b52834d79176fe5a2135/nova/virt/libvirt/cpu/api.py#L63-L69 But in a physical environment the file exists but kernel returns busy if the cpu is offline.
This is a bug in the power mgmt implementation in nova.
- is duplicated by
-
OSPRH-8734 nova.exception.DeviceBusy after redeploying the daplane
- Closed
- links to
-
RHBA-2024:138187 Release of components for RHOSO 18.0.1
- mentioned on