Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: rhel-9.6
Affects Version/s: rhel-9.2.0.z
Component/s: tuned
Labels:
- ecoengcl

Fixed in Build:
tuned-2.25.0-0.1.rc1.el9
Regression:
No
Severity:
Important
Keywords:

Patch, Upstream

Pool Team:

rhel-sst-cs-net-perf-services
Sub-System Group:

ssg_core_services

Internal Target Milestone:
23
Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/144994
Test Coverage:
None

Experience:
Architecture:

x86_64
RH Private Keywords:

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

See OCPBUGS-36615 for full details, but the summary is:

Tuned's scheduler plugin first lists all processes to be processes, then filters them by cgroup name and only after that tries to set the cpu affinity.

This leaves a long gap for a race condition to happen.

In our case a container starts as a process in the system.slice and later (6 cpu ticks or so..) moves to kubepods.slice.

When tuned lists it in the first phase then it will never notice the later move to the ignored cgroup and tries to set the affinity of it.

That is wrong, even though the kubernetes' cpu manager will fix it in few seconds.

What is worse for the customer is when kubepods.slice has conflicting cpuset configuration. In such case tuned reports a false error which confuses the customer and his monitoring tools.

What is the impact of this issue to you?

Customer complains about false errors that confuse his system monitoring and remediation automation tools.

Please provide the package NVR for which the bug is seen:

All Tuned versions as included in OCP 4.14+.

How reproducible is this bug?:

It takes about 30 minutes on OpenShift, see the information in OCPBUGS-36615

Expected results

No attempt to set affinity or at least no error logged when the error is not a real error.

links to

RHSA-2025:144994 tuned update

Assignee:: Pavol Zacik

Reporter:: Martin Sivak

Developer:: Jaroslav Škarvada

QA Contact:: Robin Hack

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2025/01/07 10:36 AM

Updated:: 2025/03/10 4:05 PM

Target end:: 2025/01/27

Next Planned Release Date:: 2025/05/13

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Expected results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates