-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
4.19
Description of problem:
Durning CUDN BGP export scenario scale testing,
- High cpu usage of ovnkube-controller container
- High cpu usage on both master and worker nodes
I am scale testing BGP router export scenario on a OCP 4.19 baremetal cluster with 24 worker nodes (no infra nodes) using 4.19.0-ec.3 with OVNK BGP image which is built using the PR build 4.19,openshift/ovn-kubernetes#2239 on 03/25/2025.
Test creates 72 CUDN, 1 namespace per CUDN and 1 pod per namespace. I waited for 30 minutes after creating all the resources and before creating router advertisements.
Then test creates 72 router advertisements where each RA advertises only one unique cudn i.e RA:CUDN is 1:1.
Average CPU usage of "controller" container (in frr-k8s pod of openshift-frr-k8s namespace) is 1390% and max cpu usage is 1813%.
We see very high cpu usage on each master node as well i.e avg 2935% and max 4625%.
cpu usage on each worker node avg 2835% and max 4472%
Component | Avg CPU usage % | Max CPU usage % |
---|---|---|
ovnkube-controller container | 788 | 2139 |
frr controller container | 1390 | 1813 |
Master node | 2909 | 4625 |
Worker node | 2765 | 4522 |
Test results are avaialble at https://docs.google.com/spreadsheets/d/1WLuTpcrTwFBUcZ-XF2wppOJL9M13_V_HYh2i23UpQUo/edit?usp=sharing
Grafana screenshots at https://storage.scalelab.redhat.com/anilvenkata/bgp/ra72export/
I can provide the live environment to the engineer for troubleshooting.
- is depended on by
-
CORENET-6015 BGP External Issue tracker
-
- In Progress
-
- links to
-
RHBA-2025:12341 OpenShift Container Platform 4.19.7 bug fix update