-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.12.z
-
None
Description of problem:
My customer is seeing one of the ovnkube-master pods high memory load. There are not many networkpolicies active. $ oc adm top pod NAME CPU(cores) MEMORY(bytes) ovnkube-master-6wwl8 136m 355Mi ovnkube-master-785ft 408m 14613Mi ovnkube-master-xlrj7 33m 376Mi ovnkube-node-2v9g4 10m 188Mi ovnkube-node-4tn6j 12m 196Mi ovnkube-node-5ffn6 16m 200Mi ovnkube-node-67vrd 9m 184Mi ovnkube-node-7sw29 16m 184Mi ovnkube-node-9fsnc 14m 193Mi ovnkube-node-dngjh 13m 239Mi ovnkube-node-dphq9 15m 231Mi ovnkube-node-jpc27 5m 217Mi ovnkube-node-nfsvf 14m 196Mi ovnkube-node-sczr9 14m 199Mi ovnkube-node-shbqq 14m 187Mi ovnkube-node-tqftp 14m 218Mi ovnkube-node-w5747 12m 195Mi ovnkube-node-wdgb7 8m 154Mi
Version-Release number of selected component (if applicable):
4.12.26
How reproducible:
Only 2 clusters affected so very hard to say
Steps to Reproduce:
1. Get pprof trace from ovnk-master leader (any chosen ovnkube-controller pod for 4.14) You can use network-tools https://github.com/openshift/network-tools ovn-pprof-forwarding for that. Collect data by running curl localhost:<choose port>/debug/pprof/trace?seconds=40 > trace 2. create and delete the following network policy 3 times ``` apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-policy namespace: default spec: podSelector: {} policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: default podSelector: matchLabels: app: test ``` 3. Collect one more trace `trace2` 4. now compare traces, to do so run go tool trace <trace file> it will open a browser window, go to Goroutine analysis, then note the N value for either `github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).periodicallyRetryResources` or `github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func4` It shows the number of created goroutines, when the bug is present `trace2` will show higher N compared to `trace`, and when the bug is fixed, it should show the same
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-23014 High memory usage on one of the ovnkube-master pods on some clusters
- Closed
- is depended on by
-
OCPBUGS-23014 High memory usage on one of the ovnkube-master pods on some clusters
- Closed
- links to
-
RHBA-2023:7470 OpenShift Container Platform 4.14.z bug fix update