-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.15
-
None
-
Moderate
-
None
-
Glaceon OLM Sprint 267
-
1
-
False
-
Description of problem:
redhat-operators pod consuming lot of the master node's CPU.
Version-Release number of selected component (if applicable):
OCP 4.15.35
How reproducible:
Not reproducible
Steps to Reproduce:
1. 2. 3.
Actual results:
registry-server container (redhat-operators pod) consuming 2.61 CPU cores on the master node where it is running (moving the pod to another node have the same result). # cat sosreport-ent-xkjjh-master-0/sos_commands/crio/crictl_stats | sort -nk3 | tail -2 | column -t CONTAINER NAME CPU % MEM DISK INODES 61c30b2fd9211 registry-server 261.19 54.72MB 8.192kB 14 f8b08095674f1 kube-apiserver 272.09 9.659GB 245.8kB 27
Expected results:
redhat-operators pod should not consume that much CPU.
Additional info:
We collected CPU and memory profiles with pprof and these are the results:
- CPU profile
# go tool pprof -top profile.pb File: opm Build ID: c34f21afc19ad938b07c9b76ac976d40e72bab9a Type: cpu Time: Feb 12, 2025 at 9:46am (CET) Duration: 30s, Total samples = 77.25s (257.50%) Showing nodes accounting for 70.31s, 91.02% of 77.25s total Dropped 342 nodes (cum <= 0.39s) flat flat% sum% cum cum% 12.69s 16.43% 16.43% 38.33s 49.62% runtime.scanobject ... 2.57s 3.33% 52.57% 48.25s 62.46% runtime.gcDrain ...
- Memory profile
# go tool pprof -alloc_space -top heap_profile.pb File: opm Build ID: c34f21afc19ad938b07c9b76ac976d40e72bab9a Type: alloc_space Time: Feb 12, 2025 at 11:38am (CET) Showing nodes accounting for 1351884.79MB, 98.63% of 1370616.60MB total Dropped 334 nodes (cum <= 6853.08MB) flat flat% sum% cum cum% 694944.35MB 50.70% 50.70% 694944.35MB 50.70% encoding/json.unquoteBytes 646676.62MB 47.18% 97.88% 1341620.97MB 97.88% encoding/json.(*decodeState).literalStore 8589.47MB 0.63% 98.51% 8589.47MB 0.63% google.golang.org/protobuf/proto.MarshalOptions.marshal ...
It seems that the opm binary running inside the pod redhat-operators is allocating a lot of memory to decode JSON values causing the golang garbage collector to use most of the CPU.
I asked the customer to get the bundles from the catalogsource redhat-operators by executing:
oc run -n openshift-marketplace grpcurl -ti --rm --image=quay.io/gmeghnag/grpcurl --command -q -- grpcurl -plaintext -d '{}' redhat-operators.openshift-marketplace.svc:50051 api.Registry/ListBundles > bundles.json
But I don't see anything weird from that bundles.json (its size is not different from the one collected from my lab cluster).
- duplicates
-
OCPBUGS-36421 redhat-operators pod experiencing unusually high CPU utilization
-
- Closed
-
- relates to
-
OCPBUGS-48697 OLMv0: excessive catalog source snapshots cause severe performance regression [openshift-4.15.z]
-
- Closed
-