-
Bug
-
Resolution: Unresolved
-
Undefined
-
rhel-9.2.0.z
-
tuned-2.25.0-0.1.rc1.el9
-
No
-
Important
-
Patch, Upstream
-
rhel-sst-cs-net-perf-services
-
ssg_core_services
-
23
-
5
-
False
-
-
None
-
None
-
-
x86_64
-
-
None
What were you trying to do that didn't work?
See OCPBUGS-36615 for full details, but the summary is:
Tuned's scheduler plugin first lists all processes to be processes, then filters them by cgroup name and only after that tries to set the cpu affinity.
This leaves a long gap for a race condition to happen.
In our case a container starts as a process in the system.slice and later (6 cpu ticks or so..) moves to kubepods.slice.
When tuned lists it in the first phase then it will never notice the later move to the ignored cgroup and tries to set the affinity of it.
That is wrong, even though the kubernetes' cpu manager will fix it in few seconds.
What is worse for the customer is when kubepods.slice has conflicting cpuset configuration. In such case tuned reports a false error which confuses the customer and his monitoring tools.
What is the impact of this issue to you?
Customer complains about false errors that confuse his system monitoring and remediation automation tools.
Please provide the package NVR for which the bug is seen:
All Tuned versions as included in OCP 4.14+.
How reproducible is this bug?:
It takes about 30 minutes on OpenShift, see the information in OCPBUGS-36615
Expected results
No attempt to set affinity or at least no error logged when the error is not a real error.
- links to
-
RHSA-2025:144994 tuned update