Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.15.0
Affects Version/s: 4.15
Component/s: kube-scheduler
Labels:
- PerfScale
- regression

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:

4.15.0
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

In Reliability (loaded longrun, the load is stable) test, the leader openshift-kube-scheduler pod's memory increased from 100+ MiB to ~13GB in 6 days. The other 2 openshift-kube-scheduler pods were ok.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-10-31-054858

How reproducible:

Met this the first time. I did not see this in 4.14's Reliability test.

Steps to Reproduce:

1. Install a AWS compact cluster with 3 masters, workers are on master nodes too.
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2. The test will long run and simulate multiple customers usage on the cluster.
config: 1 admin, 5 dev-test, 5 dev-prod, 1 dev-cron. 
3. Monitor the performance dashboard.

Performance dashboard:http://dittybopper-dittybopper.apps.qili-comp-etcd.qe-lrc.devcluster.openshift.com/d/go4AGIVSk/openshift-performance?orgId=1&from=1698806511000&to=now&var-datasource=Cluster%20Prometheus&var-_master_node=ip-10-0-52-74.us-east-2.compute.internal&var-_master_node=ip-10-0-54-53.us-east-2.compute.internal&var-_master_node=ip-10-0-75-225.us-east-2.compute.internal&var-_worker_node=ip-10-0-52-74.us-east-2.compute.internal&var-_infra_node=&var-namespace=All&var-block_device=All&var-net_device=All&var-interval=2m

Actual results:

The leader openshift-kube-scheduler pod's memory linearly increased from 100+ MiB to ~13GB in 6 days. The other 2 openshift-kube-scheduler pods were ok.
The peak CPU usage of the leader openshift-kube-scheduler pod also increased from <10% to 40%+.

Please see the screenshot here memory-cpu-on-leader-pod.png

Expected results:

Memory should be stable on a reasonable level with a stable workload.

Additional info:

oc adm top pod -n openshift-kube-scheduler --sort-by memory 
NAME                                                                       CPU(cores)   MEMORY(bytes)   
openshift-kube-scheduler-ip-10-0-54-53.us-east-2.compute.internal          3m           13031Mi         
openshift-kube-scheduler-ip-10-0-52-74.us-east-2.compute.internal          4m           146Mi           
openshift-kube-scheduler-ip-10-0-75-225.us-east-2.compute.internal         3m           136Mi           
openshift-kube-scheduler-guard-ip-10-0-52-74.us-east-2.compute.internal    0m           0Mi             
openshift-kube-scheduler-guard-ip-10-0-54-53.us-east-2.compute.internal    0m           0Mi             
openshift-kube-scheduler-guard-ip-10-0-75-225.us-east-2.compute.internal   0m           0Mi

Studying materials

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

anonymous.yaml
2023/11/15 10:08 AM
0.5 kB
Jan Chaloupka

links to

RHSA-2023:7198 OpenShift Container Platform 4.15 security update

Assignee:: Jan Chaloupka

Reporter:: Qiujie Li

Need Info From:: None

Contributors:: None

QA Contact:: Rama Kasturi Narra

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/11/06 4:47 AM

Updated:: 2025/07/24 11:47 PM

Resolved:: 2024/02/27 9:07 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates