Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50672

Tuned profile degraded on ARM cluster - Module kvm_intel not found

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18, 4.19
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • aarch64
    • 2025-07-16: Tuned fix merged in july 14th 2025, will need to wait for the next FDP to verify the fix ~August 15 2025
    • None
    • None
    • CNF Compute Sprint 266, CNF Compute Sprint 267, CNF Compute Sprint 268, CNF Compute Sprint 269, CNF Compute Sprint 270, CNF Compute Sprint 271, CNF Compute Sprint 272, CNF Compute Sprint 273, CNF Compute Sprint 274, CNF Compute Sprint 275, CNF Compute Sprint 276, CNF Compute Sprint 277
    • 12
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          Applying a performance profile on an ARM cluster, results with the tuned profile to turn degraded.

      Version-Release number of selected component (if applicable):

          4.18.0-0.nightly-arm64-2025-02-08-033503

      How reproducible:

      Apply a performance profile     

      Steps to Reproduce:

          1. Label a worker node with a custom label (e.g worker-cnf)
          2. Create an MCP referencing that label
          3. Apply a performance profile:
      
      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: performance
      spec:
        cpu:
          isolated: 1-3
          reserved: "0"
        hugepages:
          defaultHugepagesSize: 2M
          pages:
          - count: 2
            size: 2M
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: worker-cnf
        nodeSelector:
          node-role.kubernetes.io/worker-cnf: ''
        numa:
          topologyPolicy: single-numa-node
        workloadHints:
          highPowerConsumption: true
          perPodPowerManagement: false
          realTime: true

       

          

      Actual results:

      oc get profile
      NAME                          TUNED                                    APPLIED   DEGRADED   MESSAGE                  AGE
      ip-10-0-1-225.ec2.internal    openshift-control-plane                  True      False      TuneD profile applied.   3d1h
      ip-10-0-37-254.ec2.internal   openshift-control-plane                  True      False      TuneD profile applied.   3d1h
      ip-10-0-51-111.ec2.internal   openshift-node-performance-performance   True      True       TuneD profile applied.   3d1h
      ip-10-0-66-39.ec2.internal    openshift-control-plane                  True      False      TuneD profile applied.   3d1h
      ip-10-0-74-241.ec2.internal   openshift-node                           True      False      TuneD profile applied.   3d
      ip-10-0-9-208.ec2.internal    openshift-node                           True      False      TuneD profile applied.   3d1h
      
      
      Taking a look in the tuned pod I found:
      
      2025-02-12 12:36:24,027 INFO     tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch
      2025-02-12 12:36:24,028 INFO     tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '0' in the '/etc/systemd/system.conf'
      2025-02-12 12:36:25,086 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']'
      2025-02-12 12:36:25,116 ERROR    tuned.plugins.plugin_script: script '/usr/lib/tuned/cpu-partitioning/script.sh' error output: 'modinfo: ERROR: Module kvm_intel not found.'
      2025-02-12 12:36:25,116 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node-performance-performance' applied
      I0212 12:36:25.116677    2683 controller.go:702] tunedRecommendFileRead(): read "openshift-node-performance-performance" from "/etc/tuned/recommend.d/50-openshift.conf"

      Expected results:

          

      Additional info:

          The ARM cluster that was tested here was hosted on AWS.
      The worker-cnf node is a VM, and we need to invesigate if this is a leading cause to this failure. 

              yquinn@redhat.com Yanir Quinn
              rh-ee-rbaturov Ronny Baturov (Inactive)
              None
              None
              Roy Shemtov Roy Shemtov
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: