-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
Description of problem:
I'm not really convinced this falls under the installation component, but for the lack of a better space, will open it against that for now. Please feel free to move it where it needs to be.
Immediately following the iinstall the of CNV on a 500 node baremetal clusteer (3 masters + 497 workers), we see 3511 CNV related pods brought up which is quite a lot. But along with that, there is an increased consumption of kube-apiserver memory and CPU coinciding with the install of CNV and it continues to be high. It seems like we need to investigate if there is any unwanted polling that these pods are doing or too many watches
I will update the bug with more info as I find. Also happy to provide access to the cluster while it's up.
Version-Release number of selected component (if applicable):
2/4/3 (OCP 4.6.4)
How reproducible:
100% following a CNV install on a large scale environemnt.
Steps to Reproduce:
1. Install a large OCP cluster (500 nodes)
2. Install CNV
3.
Actual results:
API server on all masters is overloaded
Expected results:
The increase in load on API server should not be that significant
Additional info:
[kni@e16-h18-b03-fc640 ansible]$ oc get csv -n openshift-cnv
NAME DISPLAY VERSION REPLACES PHASE
kubevirt-hyperconverged-operator.v2.4.3 OpenShift Virtualization 2.4.3 kubevirt-hyperconverged-operator.v2.4.2 Installing
[kni@e16-h18-b03-fc640 ansible]$ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
bridge-marker 498 498 498 498 498 beta.kubernetes.io/arch=amd64 4h57m
hostpath-provisioner 495 495 495 495 495 <none> 4h32m
kube-cni-linux-bridge-plugin 498 498 498 498 498 beta.kubernetes.io/arch=amd64 4h57m
kubevirt-node-labeller 495 495 495 495 495 <none> 4h57m
nmstate-handler 498 498 498 498 498 beta.kubernetes.io/arch=amd64 4h57m
ovs-cni-amd64 498 498 498 498 498 beta.kubernetes.io/arch=amd64 4h57m
virt-handler 495 495 495 495 495 <none> 4h53m
In the attached pictures, you can see around 13:30 when the CNV install first happened, API server CPU/Memory usage increased and continued to be high.