-
Bug
-
Resolution: Done-Errata
-
Undefined
-
rhel-9.2.0.z
-
tuned-2.24.0-0.1.rc1.el9
-
No
-
Important
-
rhel-sst-cs-net-perf-services
-
ssg_core_services
-
23
-
3
-
False
-
-
None
-
None
-
Pass
-
Not Needed
-
None
-
None
What were you trying to do that didn't work?
Applying a cpu-partitioning tuned profile sometimes reports
tuned.plugins.plugin_scheduler: Failed to set affinity of PID 24360 to '[0, 1, 32, 33]': [Errno 22] Invalid argument
This can be linked to a log by crio:
Jun 28 12:49:57 master1.winterfell-mno-2.lab.neat.nsn-rdnet.net crio[4103]: time="2024-06-28 12:49:57.837486513Z" level=warning msg="Found defunct process with PID 24360 (haagent)"
The failure is real and should be logged. However the severity is causing trouble with system health monitoring services, because ERROR is typically considered to be an unrecoverable error worth investigating.
The severity on customer side is pretty high, because their cluster monitoring is always red due to this issue, even though the workload is not affected.
Please provide the package NVR for which bug is seen:
tuned as shipped in NTO for OCP 4.14.31
How reproducible:
Alway after reboot on customer cluster.
Expected results
Tuned should be smart enough and check whether the process (or interrupt) is actually present and running after getting this failure. The operating system is dynamic and such race conditions can happen (a process disappearing quickly for example). Tuned should ignore such transitional failures it can recognize as such.
- links to
-
RHBA-2024:136843 tuned update