-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.14
-
Critical
-
No
-
SDN Sprint 239
-
1
-
Approved
-
False
-
Description of problem:
In Reliability (loaded longrun) test, the memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G. I did not see this issue in previous releases.
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-06-27-000502
How reproducible:
I met this issue the first time
Steps to Reproduce:
1. Install a AWS OVN cluster with 3 masters, 3 workers, vm_type are all m5.xlarge. 2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2 with config: 1 admin, 15 dev-test, 1 dev-prod. The test will long run the configured tasks. 3. Monitor the test failures in and performance dashboard. Test failures slack notification: https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1687944463913769 Performance dashboard:http://dittybopper-dittybopper.apps.qili-414-haproxy.qe-lrc.devcluster.openshift.com/d/IgK5MW94z/openshift-performance?orgId=1&from=1687944452000&to=now&refresh=1h
Actual results:
The memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G.
Expected results:
The memory of ovnkube-node-xxx pods
Additional info:
% oc adm top pod -n openshift-ovn-kubernetes | grep node ovnkube-node-4t282 146m 1862Mi ovnkube-node-9p462 41m 1847Mi ovnkube-node-b6rqj 46m 2032Mi ovnkube-node-fp2gn 72m 2107Mi ovnkube-node-hxf95 11m 2359Mi ovnkube-node-ql9fx 38m 2089Mi
I did a pprof heap on one of the pod and upload to heap-ovnkube-node-4t282.out
Must-gather is uploaded to must-gather.local.1315176578017655774.tar.gz
performance dashboard screenshot for ovnkube-node-memory.png
- clones
-
OCPBUGS-15544 [Reliability] regression: continuously memory increase on a ovnkube-node pod
- Closed
- links to
-
RHEA-2023:5006 rpm