-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.19.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
During a recent perf and scale testing on a 250-node ARO-HCP cluster, the cluster-api-controller-manager pod was observed consuming unusually high CPU resources even at idle state(no guest workload), sharp spike every 10 minutes. Causes noisy CPU throttling and potential performance degradation in other control plane components.
Frequent CPU spikes, which is significantly higher than expected for steady-state control plane operations. From the logs it is correlated with large-scale reconciliation events and periodic machine health check is causing this even when no major scaling actions are taking place.
Version-Release number of selected component (if applicable):
4.19.z
How reproducible:
Always at this scale
Steps to Reproduce:
1. Create a ARO-HCP cluster with 250 nodes at least
2. Watch container cpu usage of cluster-api pod from HCP namespace - 'container_cpu_usage_seconds_total'
3.
Actual results:
Frequent CPU Spikes
Expected results:
Comparable CPU usage(in ROSA-HCP, usage is > a core)
Additional info:
Link to node level usage screenshots https://drive.google.com/drive/folders/1-1l5xhGMrTqJvLfdpV79ExTNmPQoWRA3
At 500 nodes the consumption is over 14 cores on the shared MC worker.
Logs shows frequent health check,
I1022 16:42:44.351263 1 machinehealthcheck_targets.go:326] "Health checking target" controller="machinehealthcheck" controllerGroup="cluster.x-k8s.io" controllerKind="MachineHealthCheck" MachineHealthCheck="ocm-arohcpprod-2m38p19lqrvda3v1lr0mn0jo0ecv2fke-aro-250/aro-250-np-static-1" namespace="ocm-arohcpprod-2m38p19lqrvda3v1lr0mn0jo0ecv2fke-aro-250" name="aro-250-np-static-1" reconcileID="f8e75ce6-a6b2-4099-8ec7-be6a6ef94370" Cluster="ocm-arohcpprod-2m38p19lqrvda3v1lr0mn0jo0ecv2fke-aro-250/2m38p19lqrvda3v1lr0mn0jo0ecv2fke" Machine="ocm-arohcpprod-2m38p19lqrvda3v1lr0mn0jo0ecv2fke-aro-250/aro-250-np-static-1-l76tf-xzgpq" Node="aro-250-np-static-1-l76tf-xzgpq"