Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-25480

C6 Power State Not Enabled with cpu-partitioning-powersave Profile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • edpm-ansible
    • None
    • Priority Bugs
    • 1
    • Critical

      1. Summary

      When deploying OpenStack compute nodes with edpm_tuned_max_power_cstate: "c6" and the cpu-partitioning-powersave tuned profile, the C6 deep sleep state is not actually enabled at the hardware level, preventing any measurable power savings.

      2. Environment

      • RHEL 9.x / RHOSO 18
      • Tuned profile: cpu-partitioning-powersave
      • Hardware: Intel Xeon E5-2630 v4 (Broadwell)
      • Deployment config:
          edpm_tuned_profile: "cpu-partitioning-powersave"
          edpm_tuned_max_power_cstate: "c6"
          edpm_tuned_isolated_cores: "2-19,22-39"

      3. Problem Description

      Despite configuring the powersave profile with C6 support, CPUs never enter C6 state and power consumption remains identical to baseline mode.

      3.1 Expected Behavior

      When edpm_tuned_max_power_cstate: "c6" is set:
      1. Tuned should configure max_power_state=cstate.name:C6|140
      2. C6 state should be enabled at kernel level
      3. CPUs should enter C6 during idle periods
      4. Power consumption should decrease compared to baseline

      3.2 Actual Behavior

      1. [PASS] Tuned sets max_power_state=cstate.name:C6|140 correctly
      2. [FAIL] C6 remains disabled: /sys/devices/system/cpu/cpu*/cpuidle/state4/disable = 1
      3. [FAIL] CPUs never enter C6 state
      4. [FAIL] Power consumption identical to baseline mode

      4. Reproduction Steps

      4.1 Deploy with powersave profile

      Configure deployment with:
        edpm_tuned_profile: "cpu-partitioning-powersave"
        edpm_tuned_max_power_cstate: "c6"

      4.2 Verify tuned configuration

      Check active profile:
        ssh cloud-admin@compute-node "tuned-adm active"
        Output: Current active profile: cpu-partitioning-powersave

      Check tuned configuration:
        ssh cloud-admin@compute-node "grep max_power_state /etc/tuned/cpu-partitioning-powersave-variables.conf"
        Output: max_power_state=cstate.name:C6|140

      4.3 Check C6 hardware state

      Command:
        ssh cloud-admin@compute-node "cat /sys/devices/system/cpu/cpu0/cpuidle/state4/disable"
        Output: 1  (DISABLED) <-- BUG: Should be 0

      4.4 Measure power consumption

      Using Redfish API to measure actual power consumption:

      Get baseline with cpu-partitioning profile:
        tuned-adm profile cpu-partitioning
        sleep 60
        Measure: ~147W per node

      Switch to powersave profile:
        tuned-adm profile cpu-partitioning-powersave
        sleep 60
        Measure: ~147W per node (NO DIFFERENCE)

      Power measurement via iDRAC Redfish API:
        curl -k -u root:password https://idrac-ip/redfish/v1/Chassis/System.Embedded.1/Power | jq '.PowerControl[0].PowerConsumedWatts'

      4.5 Check C6 usage statistics

      Check C6 time:
        ssh cloud-admin@compute-node "cat /sys/devices/system/cpu/cpu0/cpuidle/state4/time"
        Output: 0  (never entered C6)

      Check C6 usage count:
        ssh cloud-admin@compute-node "cat /sys/devices/system/cpu/cpu0/cpuidle/state4/usage"
        Output: 0  (never entered C6)

      5. Workaround

      Manually enable C6 at kernel level:

        ssh cloud-admin@compute-node "for cpu in /sys/devices/system/cpu/cpu*/cpuidle/state4/disable; do echo 0 | sudo tee \$cpu > /dev/null; done"

      After applying workaround:

      • C6 state becomes enabled (disable=0)
      • CPUs enter C6 during idle periods
      • Power consumption reduces to ~130-135W (10-15% savings)
      • C6 usage statistics show active usage

      6. Verification After Workaround

      Step 1 - Verify C6 is enabled:
        cat /sys/devices/system/cpu/cpu0/cpuidle/state4/disable
        Output: 0 (ENABLED)

      Step 2 - Wait 60 seconds for idle time

      Step 3 - Check C6 usage:
        cat /sys/devices/system/cpu/cpu0/cpuidle/state4/usage
        Output: >0 (CPU entering C6)

      Step 4 - Measure power consumption:
        Result: ~130-135W (reduced from 147W)

      7. Impact

      • Tempest test failure: test_power_saving_tuned_profile fails because no power reduction is measured
      • No power savings: Despite powersave configuration, actual power consumption unchanged
      • Misleading configuration: System reports powersave mode but provides no actual power savings

      8. Root Cause

      The cpu-partitioning-powersave tuned profile configures the max_power_state parameter but does not write to the sysfs interface to actually enable C6 at the kernel/hardware level.

      9. Proposed Fix

      The tuned profile or deployment automation should enable C6 when edpm_tuned_max_power_cstate: "c6" is configured.

      9.1 Option 1: Fix tuned profile

      Add to /usr/lib/tuned/cpu-partitioning-powersave/script.sh:

        for cpu in /sys/devices/system/cpu/cpu*/cpuidle/state4/disable; do
            echo 0 > "$cpu"
        done

      9.2 Option 2: Fix deployment automation

      Add Ansible task when edpm_tuned_max_power_cstate: "c6":

        - name: Enable C6 cstate
          shell: |
            for cpu in /sys/devices/system/cpu/cpu*/cpuidle/state4/disable; do
              echo 0 > "$cpu"
            done
          when: edpm_tuned_max_power_cstate == "c6"

      10. Additional Information

      10.1 How to switch tuned profiles

      Switch to baseline (performance):
        sudo tuned-adm profile cpu-partitioning

      Switch to powersave:
        sudo tuned-adm profile cpu-partitioning-powersave

      Verify active profile:
        tuned-adm active

      10.2 Test case: test_power_saving_tuned_profile

      The failing tempest test performs these steps:

      1. Measure baseline power consumption with cpu-partitioning profile
      2. Switch to cpu-partitioning-powersave profile
      3. Measure powersave power consumption
      4. Assert powersave < baseline (currently fails due to this bug)

      The test obtains power measurements via:

      • Redfish API: GET /redfish/v1/Chassis/System.Embedded.1/Power
      • Field: PowerControl[0].PowerConsumedWatts
      • Multiple samples averaged over measurement period

        1. screenshot.png
          60 kB
          Anthony Harivel

              rh-ee-aharivel Anthony Harivel
              mnietoji Miguel Angel Nieto Jimenez
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: