Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.18, 4.19
Component/s: Node Tuning Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None
Architecture:

aarch64
Latest Status Summary:
2025-07-16: Tuned fix merged in july 14th 2025, will need to wait for the next FDP to verify the fix ~August 15 2025

Target Backport Versions:

4.18.z
Target Version:
None
Release Blocker:
None
Sprint:
CNF Compute Sprint 266, CNF Compute Sprint 267, CNF Compute Sprint 268, CNF Compute Sprint 269, CNF Compute Sprint 270, CNF Compute Sprint 271, CNF Compute Sprint 272, CNF Compute Sprint 273, CNF Compute Sprint 274, CNF Compute Sprint 275, CNF Compute Sprint 276, CNF Compute Sprint 277
sprint_count:
12

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    Applying a performance profile on an ARM cluster, results with the tuned profile to turn degraded.

Version-Release number of selected component (if applicable):

    4.18.0-0.nightly-arm64-2025-02-08-033503

How reproducible:

Apply a performance profile

Steps to Reproduce:

    1. Label a worker node with a custom label (e.g worker-cnf)
    2. Create an MCP referencing that label
    3. Apply a performance profile:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: performance
spec:
  cpu:
    isolated: 1-3
    reserved: "0"
  hugepages:
    defaultHugepagesSize: 2M
    pages:
    - count: 2
      size: 2M
  machineConfigPoolSelector:
    machineconfiguration.openshift.io/role: worker-cnf
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ''
  numa:
    topologyPolicy: single-numa-node
  workloadHints:
    highPowerConsumption: true
    perPodPowerManagement: false
    realTime: true

Actual results:

oc get profile
NAME                          TUNED                                    APPLIED   DEGRADED   MESSAGE                  AGE
ip-10-0-1-225.ec2.internal    openshift-control-plane                  True      False      TuneD profile applied.   3d1h
ip-10-0-37-254.ec2.internal   openshift-control-plane                  True      False      TuneD profile applied.   3d1h
ip-10-0-51-111.ec2.internal   openshift-node-performance-performance   True      True       TuneD profile applied.   3d1h
ip-10-0-66-39.ec2.internal    openshift-control-plane                  True      False      TuneD profile applied.   3d1h
ip-10-0-74-241.ec2.internal   openshift-node                           True      False      TuneD profile applied.   3d
ip-10-0-9-208.ec2.internal    openshift-node                           True      False      TuneD profile applied.   3d1h


Taking a look in the tuned pod I found:

2025-02-12 12:36:24,027 INFO     tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch
2025-02-12 12:36:24,028 INFO     tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '0' in the '/etc/systemd/system.conf'
2025-02-12 12:36:25,086 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']'
2025-02-12 12:36:25,116 ERROR    tuned.plugins.plugin_script: script '/usr/lib/tuned/cpu-partitioning/script.sh' error output: 'modinfo: ERROR: Module kvm_intel not found.'
2025-02-12 12:36:25,116 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node-performance-performance' applied
I0212 12:36:25.116677    2683 controller.go:702] tunedRecommendFileRead(): read "openshift-node-performance-performance" from "/etc/tuned/recommend.d/50-openshift.conf"

Expected results:

Additional info:

    The ARM cluster that was tested here was hosted on AWS.
The worker-cnf node is a VM, and we need to invesigate if this is a leading cause to this failure.

is caused by

RHEL-79943 tuned: 'modinfo: ERROR: Module kvm_intel not found.'

Release Pending

Assignee:: Yanir Quinn

Reporter:: Ronny Baturov (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Roy Shemtov

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/02/13 9:17 AM

Updated:: 2025/09/14 9:55 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide