-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15
-
Important
-
No
-
Approved
-
False
-
-
N/A
-
In Progress
In Reliability (loaded longrun, the load is stable) test, the 3 multus pods memory increased from <100 MiB to 700+MB in 7 days. The multus pods have requests memory: 65Mi, while there is no memory limit. If the test run for longer time and the memory keep increasing, this issue can impact the nodes' resource.
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-11-13-174800
How reproducible:
Met this the first time. I did not see this in 4.14's Reliability test.
Steps to Reproduce:
1. Install a AWS compact cluster with 3 masters, workers are on master nodes too. O 2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2. The test will long run and simulate multiple customers usage on the cluster. config: 1 admin, 5 dev-test, 5 dev-prod, 1 dev-cron. 3. Monitor the metrics: container_memory_rss{container="kube-multus",namespace="openshift-multus"}
Actual results:
3 multus pods memory increased from <100 MiB to 700+MB in 7 days. After the test load stopped, the memory increase stopped, but didn't drop down.
Expected results:
memory should not continuous increase
Additional info:
% oc adm top pod -n openshift-multus --containers=true --sort-by memory -l app=multus POD NAME CPU(cores) MEMORY(bytes) multus-xp474 kube-multus 12m 1275Mi multus-xp474 POD 0m 0Mi multus-xt64s kube-multus 21m 971Mi multus-xt64s POD 0m 0Mi multus-d9xcs kube-multus 6m 757Mi multus-d9xcs POD 0m 0Mi
The monitoring screenshots:
multus-memory-increase-stop.png
Must-gather: must-gather.local.4628887688332215806.tar.gz
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update