-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18.0, 4.19
-
None
Description of problem:
Applying a performance profile on an ARM cluster, results with the tuned profile to turn degraded
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Label a worker node with a worker-cnf label 2. Create an mcp referring to that label 3. Apply the below performance profile apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: cpu: isolated: "1-3,4-6" reserved: "0,7" hugepages: defaultHugepagesSize: 512M pages: - count: 1 node: 0 size: 512M - count: 128 node: 1 size: 2M machineConfigPoolSelector: machineconfiguration.openshift.io/role: worker-cnf net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/worker-cnf: '' kernelPageSize: 64k numa: topologyPolicy: single-numa-node realTimeKernel: enabled: false workloadHints: highPowerConsumption: true perPodPowerManagement: false realTime: true
Actual results:
Expected results:
Additional info:
[root@ampere-one-x-04 ~]# oc get profiles -A NAMESPACE NAME TUNED APPLIED DEGRADED MESSAGE AGE openshift-cluster-node-tuning-operator ocp-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com openshift-control-plane True False TuneD profile applied. 22h openshift-cluster-node-tuning-operator ocp-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com openshift-control-plane True False TuneD profile applied. 22h openshift-cluster-node-tuning-operator ocp-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com openshift-control-plane True False TuneD profile applied. 22h openshift-cluster-node-tuning-operator ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com openshift-node-performance-performance False True The TuneD daemon profile not yet applied, or application failed. 22h openshift-cluster-node-tuning-operator ocp-worker-1.libvirt.lab.eng.tlv2.redhat.com openshift-node True False TuneD profile applied. 22h openshift-cluster-node-tuning-operator ocp-worker-2.libvirt.lab.eng.tlv2.redhat.com openshift-node True False TuneD profile applied. 22h [root@ampere-one-x-04 ~]# oc describe performanceprofile Name: performance Namespace: Labels: <none> Annotations: <none> API Version: performance.openshift.io/v2 Kind: PerformanceProfile Metadata: Creation Timestamp: 2025-03-04T15:28:44Z Finalizers: foreground-deletion Generation: 1 Resource Version: 74234 UID: 0d9c1817-c12f-4ea8-9c4b-b37badc232e9 Spec: Cpu: Isolated: 1-3,4-6 Reserved: 0,7 Hugepages: Default Hugepages Size: 512M Pages: Count: 1 Node: 0 Size: 512M Count: 128 Node: 1 Size: 2M Kernel Page Size: 64k Machine Config Pool Selector: machineconfiguration.openshift.io/role: worker-cnf Net: User Level Networking: true Node Selector: node-role.kubernetes.io/worker-cnf: Numa: Topology Policy: single-numa-node Real Time Kernel: Enabled: false Workload Hints: High Power Consumption: true Per Pod Power Management: false Real Time: true Status: Conditions: Last Heartbeat Time: 2025-03-04T15:28:45Z Last Transition Time: 2025-03-04T15:28:45Z Status: False Type: Available Last Heartbeat Time: 2025-03-04T15:28:45Z Last Transition Time: 2025-03-04T15:28:45Z Status: False Type: Upgradeable Last Heartbeat Time: 2025-03-04T15:28:45Z Last Transition Time: 2025-03-04T15:28:45Z Status: False Type: Progressing Last Heartbeat Time: 2025-03-04T15:28:45Z Last Transition Time: 2025-03-04T15:28:45Z Message: Tuned ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com Degraded Reason: TunedError. Tuned ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com Degraded Message: TuneD daemon issued one or more error message(s) during profile application. TuneD stderr: . Reason: TunedProfileDegraded Status: True Type: Degraded Runtime Class: performance-performance Tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-performance Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Creation succeeded 112m (x9 over 17h) performance-profile-controller Succeeded to create all components [root@ampere-one-x-04 ~]# oc logs pod/tuned-kjc8j I0304 15:35:50.346412 3259 controller.go:1666] starting in-cluster ocp-tuned v4.19.0-202502262344.p0.gf166846.assembly.stream.el9-0-g0d9dd16-dirty I0304 15:35:50.401840 3259 controller.go:671] writing /var/lib/ocp-tuned/image.env I0304 15:35:50.418669 3259 controller.go:702] tunedRecommendFileRead(): read "openshift-node-performance-performance" from "/etc/tuned/recommend.d/50-openshift.conf" I0304 15:35:50.419585 3259 controller.go:1728] starting: profile unpacked is "openshift-node-performance-performance" fingerprint "ab0d99d8009d6539b91ed1aeff3e4fa1c629c1cd4e9a32bdc132dcc9737e4fc9" I0304 15:35:50.419646 3259 controller.go:1424] recover: no pending deferred change I0304 15:35:50.419666 3259 controller.go:1734] starting: no pending deferred update I0304 15:36:06.074575 3259 controller.go:382] disabling system tuned... I0304 15:36:06.121045 3259 controller.go:1546] started events processors I0304 15:36:06.121492 3259 controller.go:359] set log level 0 I0304 15:36:06.121850 3259 controller.go:1567] monitoring filesystem events on "/etc/tuned/bootcmdline" I0304 15:36:06.121886 3259 controller.go:1570] started controller I0304 15:36:06.122603 3259 controller.go:692] tunedRecommendFileWrite(): written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-node-performance-performance I0304 15:36:06.122634 3259 controller.go:417] profilesExtract(): extracting 6 TuneD profiles (recommended=openshift-node-performance-performance) I0304 15:36:06.210862 3259 controller.go:462] profilesExtract(): recommended TuneD profile openshift-node-performance-performance content unchanged [openshift] I0304 15:36:06.211950 3259 controller.go:462] profilesExtract(): recommended TuneD profile openshift-node-performance-performance content unchanged [openshift-node-performance-performance] I0304 15:36:06.212311 3259 controller.go:478] profilesExtract(): fingerprint of extracted profiles: "ab0d99d8009d6539b91ed1aeff3e4fa1c629c1cd4e9a32bdc132dcc9737e4fc9" I0304 15:36:06.212389 3259 controller.go:818] tunedReload() I0304 15:36:06.212493 3259 controller.go:745] starting tuned... I0304 15:36:06.212547 3259 run.go:121] running cmd... 2025-03-04 15:36:06,335 INFO tuned.daemon.application: TuneD: 2.25.1, kernel: 5.14.0-570.el9.aarch64+64k 2025-03-04 15:36:06,335 INFO tuned.daemon.application: dynamic tuning is globally disabled 2025-03-04 15:36:06,340 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2025-03-04 15:36:06,340 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2025-03-04 15:36:06,341 INFO tuned.daemon.daemon: Using 'openshift-node-performance-performance' profile 2025-03-04 15:36:06,342 INFO tuned.profiles.loader: loading profile: openshift-node-performance-performance 2025-03-04 15:36:06,460 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'openshift-node-performance-performance': Cannot find profile 'openshift-node-performance--aarch64-performance' in '['/var/lib/ocp-tuned/profiles', '/usr/lib/tuned', '/usr/lib/tuned/profiles']'. 2025-03-04 15:36:06,461 INFO tuned.daemon.controller: starting controller sh-5.1# systemctl --no-pager | grep hugepages dev-hugepages.mount loaded active mounted Huge Pages File System ● hugepages-allocation-2048kB-NUMA1.service loaded failed failed Hugepages-2048kB allocation on the node 1 hugepages-allocation-524288kB-NUMA0.service loaded active exited Hugepages-524288kB allocation on the node 0 sh-5.1# systemctl status hugepages-allocation-2048kB-NUMA1.service × hugepages-allocation-2048kB-NUMA1.service - Hugepages-2048kB allocation on the node 1 Loaded: loaded (/etc/systemd/system/hugepages-allocation-2048kB-NUMA1.service; enabled; preset: disabled) Active: failed (Result: exit-code) since Tue 2025-03-04 15:32:33 UTC; 17h ago Main PID: 1002 (code=exited, status=1/FAILURE) CPU: 6ms Mar 04 15:32:33 ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com systemd[1]: Starting Hugepages-2048kB allocation on the node 1... Mar 04 15:32:33 ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com hugepages-allocation.sh[1002]: ERROR: /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages does not exist Mar 04 15:32:33 ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com systemd[1]: hugepages-allocation-2048kB-NUMA1.service: Main process exited, code=exited, status=1/FAILURE Mar 04 15:32:33 ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com systemd[1]: hugepages-allocation-2048kB-NUMA1.service: Failed with result 'exit-code'. Mar 04 15:32:33 ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Hugepages-2048kB allocation on the node 1. sh-5.1# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-e032e3de5cffeccaf88bc5dc1945da35b4273c5f5b758a6ca1d0d78344b55e7f/vmlinuz-5.14.0-570.el9.aarch64+64k rw ostree=/ostree/boot.0/rhcos/e032e3de5cffeccaf88bc5dc1945da35b4273c5f5b758a6ca1d0d78344b55e7f/0 ignition.platform.id=openstack console=ttyAMA0,115200n8 console=tty0 root=UUID=96763b3b-e217-4879-a03e-56568ca84bf9 rw rootflags=prjquota boot=UUID=d98055a6-2355-40d3-8e87-98eedd0e8c91 systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0
bash-5.1# ls /var/lib/ocp-tuned/profiles/ openshift openshift-node-performance-intel-x86-performance openshift-node-performance-amd-x86-performance openshift-node-performance-performance openshift-node-performance-arm-aarch64-performance openshift-node-performance-rt-performance bash-5.1# cat /var/lib/ocp-tuned/profiles/openshift-node-performance-performance/tuned.conf [main] summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5) # The final result of the include depends on cpu vendor, cpu architecture, and whether the real time kernel is enabled # The first line will be evaluated based on the CPU vendor and architecture # This has three possible results: # include=openshift-node-performance-amd-x86; # include=openshift-node-performance-arm-aarch64; # include=openshift-node-performance-intel-x86; # The second line will be evaluated based on whether the real time kernel is enabled # This has two possible results: # openshift-node,cpu-partitioning # openshift-node,cpu-partitioning,openshift-node-performance-rt-<PerformanceProfile name> include=openshift-node,cpu-partitioning${f:regex_search_ternary:${f:exec:uname:-r}:rt:,openshift-node-performance-rt-performance:}; openshift-node-performance-${f:lscpu_check:Vendor ID\:\s*GenuineIntel:intel:Vendor ID\:\s*AuthenticAMD:amd:Vendor ID\:\s*ARM:arm}-${f:lscpu_check:Architecture\:\s*x86_64:x86:Architecture\:\s*aarch64:aarch64}-performance # Inheritance of base profiles legend: # cpu-partitioning -> network-latency -> latency-performance # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf # All values are mapped with a comment where a parent profile contains them. # Different values will override the original values in parent profiles. [variables] #> isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1-6 not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [cpu] #> latency-performance #> (override) force_latency=cstate.id:1|3 governor=performance energy_perf_bias=performance min_perf_pct=100 [service] service.stalld=start,enable [vm] #> network-latency transparent_hugepages=never [irqbalance] # Disable the plugin entirely, which was enabled by the parent profile `cpu-partitioning`. # It can be racy if TuneD restarts for whatever reason. #> cpu-partitioning enabled=false [scheduler] runtime=0 group.ksoftirqd=0:f:11:*:ksoftirqd.* group.rcuc=0:f:11:*:rcuc.* group.ktimers=0:f:11:*:ktimers.* default_irq_smp_affinity = ignore irq_process=false [sysctl] #> cpu-partitioning #RealTimeHint kernel.hung_task_timeout_secs=600 #> cpu-partitioning #RealTimeHint kernel.nmi_watchdog=0 #> RealTimeHint kernel.sched_rt_runtime_us=-1 #> cpu-partitioning #RealTimeHint vm.stat_interval=10 # cpu-partitioning and RealTimeHint for RHEL disable it (= 0) # OCP is too dynamic when partitioning and needs to evacuate #> scheduled timers when starting a guaranteed workload (= 1) kernel.timer_migration=1 #> network-latency net.ipv4.tcp_fastopen=3 # If a workload mostly uses anonymous memory and it hits this limit, the entire # working set is buffered for I/O, and any more write buffering would require # swapping, so it's time to throttle writes until I/O can catch up. Workloads # that mostly use file mappings may be able to use even higher values. # # The generator of dirty data starts writeback at this percentage (system default # is 20%) #> latency-performance vm.dirty_ratio=10 # Start background writeback (via writeback threads) at this percentage (system # default is 10%) #> latency-performance vm.dirty_background_ratio=3 # The swappiness parameter controls the tendency of the kernel to move # processes out of physical memory and onto the swap disk. # 0 tells the kernel to avoid swapping processes out of physical memory # for as long as possible # 100 tells the kernel to aggressively swap processes out of physical memory # and move them to swap cache #> latency-performance vm.swappiness=10 # also configured via a sysctl.d file # placed here for documentation purposes and commented out due # to a tuned logging bug complaining about duplicate sysctl: # https://issues.redhat.com/browse/RHEL-18972 #> rps configuration # net.core.rps_default_mask=${not_isolated_cpumask} [selinux] #> Custom (atomic host) avc_cache_threshold=8192 [net] channels=combined 2 nf_conntrack_hashsize=131072 [bootloader] # !! The names are important for Intel and are referenced in openshift-node-performance-intel-x86 # set empty values to disable RHEL initrd setting in cpu-partitioning initrd_remove_dir= initrd_dst_img= initrd_add_dir= # overrides cpu-partitioning cmdline cmdline_cpu_part=+nohz=on rcu_nocbs=${isolated_cores} tuned.non_isolcpus=${not_isolated_cpumask} systemd.cpu_affinity=${not_isolated_cores_expanded} # No default value but will be composed conditionally based on platform cmdline_iommu= cmdline_isolation=+isolcpus=managed_irq,${isolated_cores} cmdline_realtime_nohzfull=+nohz_full=${isolated_cores} cmdline_realtime_nosoftlookup=+nosoftlockup cmdline_realtime_common=+skew_tick=1 rcutree.kthread_prio=11 # No default value but will be composed conditionally based on platform cmdline_power_performance= # No default value but will be composed conditionally based on platform cmdline_idle_poll= [rtentsk]
- clones
-
OCPBUGS-52352 Tuned profile degraded in ARM on Vendor Id not matching Ampere (APM)
-
- Verified
-
- depends on
-
OCPBUGS-52352 Tuned profile degraded in ARM on Vendor Id not matching Ampere (APM)
-
- Verified
-
- links to