-
Bug
-
Resolution: Done
-
Major
-
None
-
4.15.0
-
None
-
Important
-
No
-
Proposed
-
False
-
-
20/11 : main issue should be solved by the OCP blocking issue (node not ready) . Profile degradation will probably have to wait for the RHEL blocking bug attached here.
-
Description of problem:
When deploying CNF workers where a PerformanceProfile is applied, the profile can't be applied and the worker node is never ready.
Version-Release number of selected component (if applicable):
current master (4.15)
How reproducible:
apiVersion: v1 items: - apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2023-10-25T18:48:39Z" finalizers: - foreground-deletion generation: 1 name: cnf-performanceprofile resourceVersion: "32192" uid: 93d463ac-6412-4a9a-ac81-5a4dba4c9730 spec: additionalKernelArgs: - nmi_watchdog=0 - audit=0 - mce=off - processor.max_cstate=1 - idle=poll - intel_idle.max_cstate=0 - amd_iommu=on cpu: isolated: 2-7 reserved: 0-1 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 4 node: 0 size: 1G nodeSelector: node-role.kubernetes.io/worker: "" realTimeKernel: enabled: false
Steps to Reproduce:
1. Deploy a cluster with 3 masters and zero worker. 2. Create a PerformanceProfile as above. 3. Scale the workers to 1
Actual results:
The worker node will never be ready, cluster-node-tuning-operator fails to apply the PerformanceProfile with this error: I1025 18:56:02.899138 1 controller.go:820] created MachineConfig 50-nto-worker with kernel parameters: [skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-7 tuned.non_isolcpus=00000003 systemd.cpu_affinity=0,1 intel_iommu=on iommu=pt isolcpus=managed_irq,2-7 nohz_full=2-7 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 default_hugepagesz=1G nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0 amd_iommu=on intel_pstate=disable] I1025 18:56:02.900768 1 status.go:306] 1/4 Profiles failed to be applied crio fails with this error: level=error msg="Container creation error: time=\"2023-10-25T19:10:06Z\" level=error msg=\"runc create failed: unable to start container process: unable to apply cgroup con figuration: failed to write \\\"0-7\\\": write /sys/fs/cgroup/cpuset/system.slice/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podef062c82f679a1ef42be9dd35e115e2f.slice/crio-c8300b86016fe34637684039adf9eef3d4f04a411c5d13ba4c464c91589551ee.scope/cpuset.cpus: pe rmission denied\"\n" id=44a49756-5a89-4c16-acea-56961a74f1ab name=/runtime.v1.RuntimeService/CreateContainer
Expected results:
The PerformanceProfile should be applied (like before, it worked) and the Worker node to be Ready.
Additional info:
Example of CI job: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/844/pull-ci-openshift-sriov-network-operator-master-e2e-openstack-nfv/1717172583696175104
- is blocked by
-
OCPBUGS-20492 crun not respecting cpu-quota:disable (or cpu-load-balancing:disable) annotations correctly
- Closed
- is related to
-
RHEL-11342 tuned.utils.commands: Writing to file '/sys/block/dm-2/queue/read_ahead_kb' error: '[Errno 2] No such file or directory: '/sys/block/dm-2/queue/read_ahead_kb''
- Closed