-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14
-
No
-
SDN Sprint 242, SDN Sprint 243
-
2
-
Rejected
-
False
-
Description of problem:
We see very high memory usage (2.2GiB) by ovnkube-node pod during resource deletion while running cluster-density-v2 scale test on 120 node OVN-IC environment.
After creating all the 800 namespaces with the resources, memory usage was 1.6GiB. But during deletion of the resources (80 namespace deletion) it reached 2.2 GiB. However the memory was back to 1.6GiB once deletion & recreation of the 80 namespaces is finished.
Memory usage during each step of Cluster-density-v2 test -
1) Memory usage was 1.6GiB after creating 800 namespace(each namespace with 10 pods, 5 services, 2 routes and 3 network policies).
2) After creating all these resources, it deletes 10% of the namespaces (i.e 80 namespaces), waits till all namespaces are deleted. Memory usage spiked to 2.2GiB at this time.
3) Memory usage 1.8GiB during recreation phase (i.e 80 namespaces with the resources).
4) Idle for 10 minutes i.e no resource creation or deletion (CPU usage will be dropped to normal). Memory usage is back to 1.6 GiB
5) Test follows steps 2 to 4 (i.e deletion, creation and idle) for 1 hour duration. In total we will have 4 deletion, creation, idle events in the entire test run. Memory usage is spiking to 2.2 GiB and then settling back to 1.6 GiB after some time.
Ovnkube-controller and ovnkube-node container are responsible for this high memory usage in the ovnkube-node pod.
1) Ovnkube-controller container is using 1 GiB memory during deletion, 740 during recreation of 80 namespaces and 630 MiB during idle time (i.e once all resources recreated)
2) Ovnkube-node container 700 MiB during deletion, 620 MiB during recreation, finally falling back to 480 MiB once resources are created
3) Because of this Ovnkube-node pod in OVN-IC is using 2.2 GiB memory during deletion phase.
Grafana panels snapshot -
https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/zhrnsu1pnyEo7xwzI68HgX8EpsWFTC4g
https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/9qna2gm7yXqLkx9XH36OdSJSVlHhevkO
https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/n7ToUbQeB98fiE88sz0RbKjmln6Ra8zj
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-07-30-234232
- relates to
-
OCPBUGS-17773 [OVN-IC] high cpu usage (600%) by OVNK during resource deletion at scale
- Closed