Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-8806

Restarting nova-compute fails if power management is enabled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Blocker Blocker
    • rhos-18.0.1
    • rhos-18.0.0
    • openstack-nova
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • No Docs Impact
    • openstack-nova-27.4.1-18.0.20240725124737.47428f6.el9osttrunk
    • ?
    • ?
    • None
    • Important

      The nova-compute fails to start up on a physical hardware if power management is enabled, dedicated cpus are configured, and the nova-compute is restarted (initial start works).
      When the strategy is cpu_state then:

      • the nova-compute code reads the cpu state and governor values of the dedicated cpus at startup
      • then offline all the unallocated dedicated cores (except cpu0)
      • then when nova-compute is restarted
      • nova tries to read the governor from an offlined cpus and that is not supported by the kernel on a physical core, so nova fails to start up.

      https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L853-L859
      https://github.com/openstack/nova/blob/master/nova/virt/libvirt/cpu/api.py#L186

      physical hardware

      root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/online
      1
      root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
      powersave
      root@bedrock:/home/gibi# echo 0 > /sys/devices/system/cpu/cpu4/online
      root@bedrock:/home/gibi# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
      cat: /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: Device or resource busy
      root@bedrock:/home/gibi# 
      

      We did not hit this when testing in with virtual computes as there scaling_governor file does exists and that is handled in nova: https://github.com/openstack/nova/blob/e82854dc8c514e457528b52834d79176fe5a2135/nova/virt/libvirt/cpu/api.py#L63-L69 But in a physical environment the file exists but kernel returns busy if the cpu is offline.

      This is a bug in the power mgmt implementation in nova.

              sbauza@redhat.com Sylvain Bauza
              rh-ee-bgibizer Balazs Gibizer
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: