Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.16.z, 4.18.z
Component/s: Node / Kubelet
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a low-prio follow-up to OCPBUGS-54565. On systems with lots of cores and without a PerfProfile, kubelet's go GC has a tendency to go rogue. TL;DR was that the golang GC runs on min(CPU core count affinity group; CPU core count host) and it creates a lot of load during garbage collection runs on systems with lots of cores (https://tip.golang.org/doc/gc-guide).

E.g., we saw kubelet on an AMD system with 384 SMT cores easily go above 4000% CPU load, and pprof showed evidence that garbage collection was responsible for up to nearly 90% of that.

kubelet can be reigned in with a PerformanceProfile because this will set kubelet's CPU affinity, or with:

/etc/systemd/system/kubelet.service.d/99-override.conf 
[Service]
Environment="GOMAXPROCS=4"

FYI: High memory allocations on the heap are believed to be due to https://github.com/kubernetes/kubernetes/issues/104459 | https://github.com/prometheus/client_golang/issues/1702 but the problem then gets amplified by the GC default behavior.

I'm creating this ticket because it may be a good idea to think about setting GOMAXPROCS for kubelet (and other components?) to some sane default (in code or via service drop-in file). It might also be good to follow up on the cadvisor mem allocation issue.

is related to

OCPBUGS-64621 Containerized OpenShift components showing significant CPU spikes on systems with high core count

Assignee:: Node Team Bot Account

Reporter:: Andreas Karis

QA Contact:: Min Li

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2025/07/16 12:15 PM

Updated:: 2026/03/11 1:13 PM

Resolved:: 2025/07/16 1:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates