-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17.0
-
None
-
Rejected
-
False
-
Description of problem:
While running HCPs on Azure we have experienced continuous spike every few minutes by the catalog operator pods, it has spiked up to 200%(max) by a single pod. When running multiple HCPs, they were spiking all at once randomly that caused CPU starvation on the worker, that affected kubelet and containerd(AKS) daemons.
Version-Release number of selected component (if applicable):
4.17.0-rc.0
How reproducible:
Always
Steps to Reproduce:
1. Create HCPs on Azure (it may happen on AWS) 2. Watch catalog operator pod CPU usage for a few hours 3. PROMQL - sum(irate(container_cpu_usage_seconds_total{container!="POD",name!="",namespace=~"clusters.*",instance=~".*user.*",pod=~".*catalog.*"}[$interval])) by (pod) * 100
Actual results:
CPU spike randomly every few minutes, per document 12h is the default interval but its happening frequently https://olm.operatorframework.io/docs/advanced-tasks/configuring-olm/#changing-the-package-server-sync-interval
Expected results:
Every 12h is the expected interval based on the document and reduced CPU usage https://olm.operatorframework.io/docs/advanced-tasks/configuring-olm/#changing-the-package-server-sync-interval
Additional info:
This was supposed to be fixed by this - https://issues.redhat.com/browse/OCPBUGS-17950 but still happening in latest release
Usage screenshot - here
- is duplicated by
-
OCPBUGS-36421 redhat-operators pod experiencing unusually high CPU utilization
- New