Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-14688

[OSP tracker] cpu-partitioning-powersave activation (enable tuned profile) is failing during  bootstrap-edpm-deployment due to kernel stuck

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhos-18.0.7
    • rhos-18.0 FR 2 (Mar 2025)
    • edpm-ansible
    • None
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • kernel-5.14.0-427.60.1.el9_4
    • rhos-connectivity-nfv
    • Yes
    • Hide
      .Cannot set `no_turbo` when using cpu-partitioning-powersave profile

      Due to an issue with setting the `no_turbo` parameter in the kernel, tuned hangs and fails when using the cpu-partitioning-powersave profile.

      *Workaround:*

      Downgrade tuned as part of the deployment to an older version by adding the following configuration to `edpm_bootstrap_command`:

      ----
      ...
      edpm_bootstrap_command: |-
      ...
      dnf downgrade tuned-2.24.0

      ----
      Show
      .Cannot set `no_turbo` when using cpu-partitioning-powersave profile Due to an issue with setting the `no_turbo` parameter in the kernel, tuned hangs and fails when using the cpu-partitioning-powersave profile. *Workaround:* Downgrade tuned as part of the deployment to an older version by adding the following configuration to `edpm_bootstrap_command`: ---- ... edpm_bootstrap_command: |- ... dnf downgrade tuned-2.24.0 … ----
    • Release Note Not Required
    • Done
    • Rejected
    • NFV 007
    • 1
    • Important

      To Reproduce Steps to reproduce the behavior:

      1. Deploy RHOSO 18 using NFV job. I used tigon28 and a job based on uni08theta-rhel9-rhoso18.0-nfv-ovs-dpdk-sriov-trunk-patches (downstream) though the issue is reproduced also when deploying environment  with upsream containers. The issue happened also on other environments
      2. edpm deployment fails on installing  bootstrap-edpm-deployment-openstack-edpm-XXXXX pod[1]
      3. There is kernel trace on compute nodes, see [2]below
      4. sudo systemctl status tuned  reports that service is being deactivating

      Expected behavior

        No tuned related failures when installing  bootstrap-edpm-deployment-openstack-edpm-XXXXX pod while deploying edpm. 

      Device Info (please complete the following information):

        • Environment used tigon28.lab.eng.tlv2.redhat.com
        • OS on compute nodes rhel-9.4
        • openstack version 18.0.4-trunk-20250305.1
        • tuned version tuned-2.25.1-1.1.20250203git889387b0.el9fdp.noarch

      Bug impact

      • Deploying edpm is not possible without changing tuned profile

      Known workaround

      • changing edpm_tuned_profile: cpu-partitioning-powersave to edpm_tuned_profile: cpu-partitioning seems to work

      Additional context

      • [1] 
        zuul@controller-0 ~]$ oc logs -f bootstrap-edpm-deployment-openstack-edpm-65n6l -n openstack | tail -n 10 fatal: [compute-1]: FAILED! => {"changed": false, "cmd": ["/usr/sbin/tuned-adm", "profile", "cpu-partitioning-powersave"], "delta": "0:10:00.933740", "end": "2025-03-11 11:47:48.084486", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2025-03-11 11:37:47.150746", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]} fatal: [compute-0]: FAILED! => {"changed": false, "cmd": ["/usr/sbin/tuned-adm", "profile", "cpu-partitioning-powersave"], "delta": "0:10:00.929367", "end": "2025-03-11 11:47:48.077962", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2025-03-11 11:37:47.148595", "stderr": "", "stderr_lines": [], "stdout": "Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async.", "stdout_lines": ["Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async."]}
      • [2]There is kernel trace on compute nodes:
      • [ 1475.696162] INFO: task tuned:27202 blocked for more than 122 seconds.
        [ 1475.696481]       Not tainted 5.14.0-427.59.1.el9_4.x86_64 #1
        [ 1475.696803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1475.697148] task:tuned           state:D stack:0     pid:27202 ppid:1      flags:0x00000006
        [ 1475.697496] Call Trace:
        [ 1475.697834]  <TASK>
        [ 1475.698171]  __schedule+0x21b/0x550
        [ 1475.698498]  schedule+0x2d/0x70
        [ 1475.698823]  schedule_preempt_disabled+0x11/0x20
        [ 1475.699155]  rwsem_down_read_slowpath+0x37f/0x4f0
        [ 1475.699486]  down_read+0x45/0xa0
        [ 1475.699814]  show+0x32/0x90
        [ 1475.700137]  sysfs_kf_seq_show+0x98/0x100
        [ 1475.700455]  seq_read_iter+0x11d/0x4b0
        [ 1475.700767]  ? selinux_file_permission+0x108/0x150
        [ 1475.701073]  vfs_read+0x1e6/0x330
        [ 1475.701390]  ksys_read+0x5f/0xe0
        [ 1475.701684]  do_syscall_64+0x59/0x90
        [ 1475.701981]  ? clear_bhb_loop+0x35/0x90
        [ 1475.702291]  ? clear_bhb_loop+0x35/0x90
        [ 1475.702583]  ? clear_bhb_loop+0x35/0x90
        [ 1475.702873]  ? clear_bhb_loop+0x35/0x90
        [ 1475.703161]  entry_SYSCALL_64_after_hwframe+0x77/0xe1
        [ 1475.703442] RIP: 0033:0x7f36bbafd9ec
        [ 1475.703730] RSP: 002b:00007f36ba3d3b30 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
        [ 1475.704023] RAX: ffffffffffffffda RBX: 00007f36ba3d65c0 RCX: 00007f36bbafd9ec
        [ 1475.704332] RDX: 0000000000001001 RSI: 00007f36b4046bc0 RDI: 000000000000000b
        [ 1475.704633] RBP: 0000000000001001 R08: 0000000000000000 R09: 0000000000000000
        [ 1475.704932] R10: 0000000008000000 R11: 0000000000000246 R12: 00007f36b9b5d100
        [ 1475.705236] R13: 00007f36b4046bc0 R14: 000000000000000b R15: 000055e8635feb70
        [ 1475.705535]  </TASK>

              mnietoji@redhat.com Miguel Angel Nieto Jimenez
              romansaf Roman Safronov
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: