Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20.z
Component/s: kube-controller-manager
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

During the recent ROSA 4.20 Perf & Scale testing, A memory usage regression has been observed for the kube-controller-manager component in OpenShift 4.20 compared to versions 4.19 and 4.18, especially under high-scale(using the cluster-density-v2 benchmark that represent a scenario closer to customer workload).

The Average Resident Set Size (RSS) and Max Aggregated RSS for kube-controller-manager have increased substantially and disproportionately between releases. 

At the largest scale (249 workers), OCP 4.20's Average RSS usage is nearly 30% higher than 4.19 - The increase in Average RSS for the largest scale (249 workers) from 4.19 to 4.20 is 29% ((4.62GiB−3.58GiB)/3.58GiB≈29.05%), exceeding the normal infrastructure variability threshold of 10%.

Summary of Memory Regression (Average Trend): 

24 workers
Version        Average RSS Usage         Max Aggregated RSS Usage
4.20           966 MiB                   1.31 GiB
4.19           826 MiB                   1.28 GiB
4.18          647 MiB                    1.11 GiB

Increase (4.20 vs 4.19):-
Average RSS: 16.9%
Max Aggregated RSS Usage: 2.34%

120 workers
Version        Average RSS Usage         Max Aggregated RSS Usage
4.20           2.75 GiB                  4.02 GiB
4.19           1.88 GiB                  3.58 GiB
4.18           1.40 GiB                  3.83 GiB

Increase (4.20 vs 4.19)
Average RSS: 46.3%
Max Aggregated RSS Usage: 12.34%

249 workers
Version        Average RSS Usage         Max Aggregated RSS Usage
4.20           4.62 GiB                  7.06 GiB
4.19           3.58 GiB                  6.50 GiB
4.18           2.99 GiB                  5.74 GiB

Increase (4.20 vs 4.19)
Average RSS: 29.0%
Max Aggregated RSS Usage: 8.6%

Version-Release number of selected component (if applicable):

4.20.z

How reproducible: Reproducible at various scale, specially at higher worker scale.

Steps to Reproduce:

* Deployment: Deploy ROSA classic clusters for the target versions (4.20, 4.19, 4.18) with varying worker node sizes (24, 120, and ≈249 workers).

* Workload Tool Setup: Download and extract the OpenShift performance wrapper for kube-burner: https://github.com/kube-burner/kube-burner-ocp.

* Execute Workload: Run the cluster-density-v2 workload on each cluster,For the 249-worker scale, the iteration count is set to 2241 (9 ×249 workers≈2241 iterations).

Example Command (for 249 scale):

./kube-burner-ocp cluster-density-v2 --check-health=false --log-level=info --qps=20 --burst=20 --gc=true --churn-duration=20m --service-latency --gc-metrics=true --profile-type=reporting --iterations=2241 --churn=true


Observation: After the workload completes, query the monitoring system (Prometheus) for the Average RSS Usage and Max Aggregated RSS Usage of the kube-controller-manager pods across the run duration.

Actual results:

The kube-controller-manager memory usage is substantially higher in OCP 4.20 compared to OCP 4.19 and 4.18 at all scale points, with the difference being most severe at the largest scale (249 workers), as documented in the table above.

Expected results:

Memory usage (Average RSS and Max Aggregated RSS) for kube-controller-manager should be consistent and stable across major/minor versions. The memory consumption of OCP 4.20 should be equal to or better than OCP 4.19 and 4.18.

Additional info:

Performance Metrics: The data provided covers the Average trend of the last 6 months

Assignee:: Workloads Team Bot Account

Reporter:: Sandeep Yadav

Need Info From:: None

Contributors:: None

QA Contact:: Ying Zhou

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/10/07 10:33 AM

Updated:: 2025/10/07 10:49 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates