-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17.z, 4.18.z, 4.19.z
-
None
-
None
-
False
-
Description of problem:
OCP 4.17+ | Node Tuning Operator got degraded when creating a PerformanceProfile with "Profiles with bootcmdline conflict" error message
Version-Release number of selected component (if applicable):
Appeared in latest nightlies from OCP 4.17/18/19
How reproducible:
The issue is not appearing all the times
Steps to Reproduce:
1. Deploy OCP using IPI installer. In all the observed cases, cluster nodes were virtual machines managed by libvirt 2. Create the following PerformanceProfile --- kind: PerformanceProfile apiVersion: "performance.openshift.io/v2" metadata: name: libvirt-profile spec: cpu: isolated: "6-23" reserved: "0-5" hugepages: pages: - size: "1G" count: 2 node: 0 - size: "2M" count: 1000 node: 0 numa: topologyPolicy: "restricted" nodeSelector: node-role.kubernetes.io/worker: "" ... 3. Check node-tuning operator status
Actual results:
node-tuning operator is degraded, showing a message like this: "x/6 Profiles with bootcmdline conflict" (where x could be 1 or 2, at least in the errors we have detected). From the must-gather and the cluster logs, we could see that: - All cluster nodes were in Ready status - All Tuned resources were in a correct status, and not in degraded state. Message said "TuneD profile applied" - There were no issue with MCP or MC resources These are the logs we could extract from one of the Tuned resources, in one of the failed cases: 2024-11-06T04:18:02.007091488Z E1106 04:18:02.007064 1 controller.go:788] not all 3 Nodes in MCP worker agree on bootcmdline: skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=6-23 tuned.non_isolcpus=0000003f systemd.cpu_affinity=0,1,2,3,4,5 intel_iommu=on iommu=pt isolcpus=managed_irq,6-23 nohz_full=6-23 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 intel_pstate=active 2024-11-06T04:18:02.029044869Z E1106 04:18:02.028479 1 controller.go:788] not all 3 Nodes in MCP worker agree on bootcmdline: >4096active 2024-11-06T04:18:02.029631427Z I1106 04:18:02.029607 1 status.go:313] 1/6 Profiles with bootcmdline conflict 2024-11-06T04:18:02.046243885Z I1106 04:18:02.046205 1 status.go:313] 1/6 Profiles with bootcmdline conflict 2024-11-06T04:18:02.050944222Z E1106 04:18:02.050899 1 status.go:70] unable to update ClusterOperator: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again 2024-11-06T04:18:02.050944222Z E1106 04:18:02.050925 1 controller.go:198] unable to sync(profile/openshift-cluster-node-tuning-operator/dciokd-master-1) requeued (1): failed to sync Profile dciokd-master-1: failed to sync OperatorStatus: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again 2024-11-06T04:18:02.053784567Z I1106 04:18:02.052457 1 status.go:313] 1/6 Profiles with bootcmdline conflict 2024-11-06T04:18:02.063353278Z I1106 04:18:02.062462 1 status.go:313] 1/6 Profiles with bootcmdline conflict
Expected results:
node-tuning should not be degraded 100% times
Additional info:
Deployments were made using Distributed-CI, here we have all the cases where we detected this issue. In the provided links, you can find the must-gather of the cluster where the issue appeared in the Files section. - OpenShift 4.18 nightly 2024-11-01 05:41 - https://www.distributed-ci.io/jobs/19b50f0b-9d67-4151-80fe-efe766d7c8eb/files - OpenShift 4.18 nightly 2024-11-05 16:40 - https://www.distributed-ci.io/jobs/cc8a28af-8468-4511-96e7-9e5a6b2ad7a1/files - OpenShift 4.18 nightly 2024-11-21 13:21 - https://www.distributed-ci.io/jobs/4883723f-73e4-47dc-be7a-04cd61dcf619/files - OpenShift 4.17 nightly 2024-12-19 07:52 - https://www.distributed-ci.io/jobs/21187a71-543d-4782-83f9-876fc106f2e6/files - OpenShift 4.19.0 ec.0 - https://www.distributed-ci.io/jobs/b428a278-906e-41f2-93e5-a7e3705472e4/files - OpenShift 4.19 nightly 2024-12-23 18:24 - https://www.distributed-ci.io/jobs/c529fc65-a5b6-44f8-9cd4-567ddb189974/files - OpenShift 4.17 nightly 2024-12-29 13:27 - https://www.distributed-ci.io/jobs/bf5b12c3-641d-4d43-b817-2650ebf2ddfc/files - OpenShift 4.19 nightly 2024-12-31 03:14 - https://www.distributed-ci.io/jobs/e5d20de6-6f4e-48a6-bd9c-4c733d5133cc/files - OpenShift 4.17 nightly 2024-12-31 04:58 - https://www.distributed-ci.io/jobs/9c022da7-aa23-4694-81b3-f529d9d05977/files