-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.16.z, 4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This is a low-prio follow-up to OCPBUGS-54565. On systems with lots of cores and without a PerfProfile, kubelet's go GC has a tendency to go rogue. TL;DR was that the golang GC runs on min(CPU core count affinity group; CPU core count host) and it creates a lot of load during garbage collection runs on systems with lots of cores (https://tip.golang.org/doc/gc-guide).
E.g., we saw kubelet on an AMD system with 384 SMT cores easily go above 4000% CPU load, and pprof showed evidence that garbage collection was responsible for up to nearly 90% of that.
kubelet can be reigned in with a PerformanceProfile because this will set kubelet's CPU affinity, or with:
/etc/systemd/system/kubelet.service.d/99-override.conf
[Service]
Environment="GOMAXPROCS=4"
FYI: High memory allocations on the heap are believed to be due to https://github.com/kubernetes/kubernetes/issues/104459 | https://github.com/prometheus/client_golang/issues/1702 but the problem then gets amplified by the GC default behavior.
I'm creating this ticket because it may be a good idea to think about setting GOMAXPROCS for kubelet (and other components?) to some sane default (in code or via service drop-in file). It might also be good to follow up on the cadvisor mem allocation issue.
- is related to
-
OCPBUGS-64621 Containerized OpenShift components showing significant CPU spikes on systems with high core count
-
- New
-