Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Networking / ovn-kubernetes
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.14.0
Release Blocker:
Rejected
Sprint:
SDN Sprint 245
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test Coverage:

+

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

My customer is seeing one of the ovnkube-master pods high memory load. There are not many networkpolicies active.

$ oc adm top pod
NAME                   CPU(cores)   MEMORY(bytes)
ovnkube-master-6wwl8   136m         355Mi
ovnkube-master-785ft   408m         14613Mi
ovnkube-master-xlrj7   33m          376Mi
ovnkube-node-2v9g4     10m          188Mi
ovnkube-node-4tn6j     12m          196Mi
ovnkube-node-5ffn6     16m          200Mi
ovnkube-node-67vrd     9m           184Mi
ovnkube-node-7sw29     16m          184Mi
ovnkube-node-9fsnc     14m          193Mi
ovnkube-node-dngjh     13m          239Mi
ovnkube-node-dphq9     15m          231Mi
ovnkube-node-jpc27     5m           217Mi
ovnkube-node-nfsvf     14m          196Mi
ovnkube-node-sczr9     14m          199Mi
ovnkube-node-shbqq     14m          187Mi
ovnkube-node-tqftp     14m          218Mi
ovnkube-node-w5747     12m          195Mi
ovnkube-node-wdgb7     8m           154Mi

Version-Release number of selected component (if applicable):

4.12.26

How reproducible:

Only 2 clusters affected so very hard to say

Steps to Reproduce:

1. Get pprof trace from ovnk-master leader (any chosen ovnkube-controller pod for 4.14)
You can use network-tools https://github.com/openshift/network-tools ovn-pprof-forwarding for that.
Collect data by running
curl localhost:<choose port>/debug/pprof/trace?seconds=40 > trace
2. create and delete the following network policy 3 times
```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 name: test-policy
 namespace: default
spec:
 podSelector: {}
 policyTypes:
 - Ingress
 ingress:
 - from:
   - namespaceSelector:
       matchLabels:
         kubernetes.io/metadata.name: default
     podSelector:
       matchLabels:
         app: test
```
3. Collect one more trace `trace2`
4. now compare traces, to do so run
go tool trace <trace file>
it will open a browser window, go to Goroutine analysis, then note the N value for either 
`github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).periodicallyRetryResources` or `github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func4`
It shows the number of created goroutines, when the bug is present `trace2` will show higher N compared to `trace`, and when the bug is fixed, it should show the same

Actual results:

Expected results:

Additional info:

clones

OCPBUGS-23014 High memory usage on one of the ovnkube-master pods on some clusters

Closed

is depended on by

OCPBUGS-23014 High memory usage on one of the ovnkube-master pods on some clusters

Closed

links to

[DownstreamMerge] 11 jul 23

Netpol retryFramework cleanup

RHBA-2023:7470 OpenShift Container Platform 4.14.z bug fix update

Assignee:: Nadia Pinaeva (Inactive)

Reporter:: Andy Bartlett

Need Info From:: None

Contributors:: None

QA Contact:: Arti Sood

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/11/20 2:18 PM

Updated:: 2025/09/13 7:46 PM

Resolved:: 2023/11/29 11:38 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates