Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Networking / ovn-kubernetes
Labels:
- regression
- test-blocker

Severity:
Critical
Regression:
No
Sprint:
SDN Sprint 239
sprint_count:
1
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

In Reliability (loaded longrun) test, the memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G. I did not see this issue in previous releases.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-27-000502

How reproducible:

I met this issue the first time

Steps to Reproduce:

1. Install a AWS OVN cluster with 3 masters, 3 workers, vm_type are all m5.xlarge.
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2 with config: 1 admin, 15 dev-test, 1 dev-prod. The test will long run the configured tasks.
3. Monitor the test failures in and performance dashboard.

Test failures slack notification: https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1687944463913769

Performance dashboard:http://dittybopper-dittybopper.apps.qili-414-haproxy.qe-lrc.devcluster.openshift.com/d/IgK5MW94z/openshift-performance?orgId=1&from=1687944452000&to=now&refresh=1h

Actual results:

The memory of ovnkube-node-xxx pods on all 6 nodes keep increasing.
Within 24 hours, increased to about 1.6G.

Expected results:

The memory of ovnkube-node-xxx pods

Additional info:

% oc adm top pod -n openshift-ovn-kubernetes | grep node
ovnkube-node-4t282     146m         1862Mi          
ovnkube-node-9p462     41m          1847Mi          
ovnkube-node-b6rqj     46m          2032Mi          
ovnkube-node-fp2gn     72m          2107Mi          
ovnkube-node-hxf95     11m          2359Mi          
ovnkube-node-ql9fx     38m          2089Mi

I did a pprof heap on one of the pod and upload to heap-ovnkube-node-4t282.out
Must-gather is uploaded to must-gather.local.1315176578017655774.tar.gz
performance dashboard screenshot for ovnkube-node-memory.png

clones

OCPBUGS-15544 [Reliability] regression: continuously memory increase on a ovnkube-node pod

Closed

links to

[DownstreamMerge] 7-13-23

RHEA-2023:5006 rpm

Sync cache with timeout, fix apbroute start

Assignee:: Nadia Pinaeva

Reporter:: Qiujie Li

QA Contact:: Qiujie Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/07/13 9:53 AM

Updated:: 2023/10/31 1:35 PM

Resolved:: 2023/10/31 1:35 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates