Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50952

redhat-operators pod consuming lot of the master node's CPU

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.15
    • OLM / Registry
    • None
    • Moderate
    • None
    • Glaceon OLM Sprint 267
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      redhat-operators pod consuming lot of the master node's CPU.

      Version-Release number of selected component (if applicable):

      OCP 4.15.35

      How reproducible:

      Not reproducible

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      registry-server container (redhat-operators pod) consuming 2.61 CPU cores on the master node where it is running (moving the pod to another node have the same result). 
      
      # cat sosreport-ent-xkjjh-master-0/sos_commands/crio/crictl_stats | sort -nk3 | tail -2 | column -t
      CONTAINER      NAME             CPU %   MEM      DISK     INODES 
      61c30b2fd9211  registry-server  261.19  54.72MB  8.192kB  14
      f8b08095674f1  kube-apiserver   272.09  9.659GB  245.8kB  27

      Expected results:

      redhat-operators pod should not consume that much CPU.    

      Additional info:

      We collected CPU and memory profiles with pprof and these are the results:

      • CPU profile
        # go tool pprof -top profile.pb                 
        
        File: opm
        Build ID: c34f21afc19ad938b07c9b76ac976d40e72bab9a
        Type: cpu
        Time: Feb 12, 2025 at 9:46am (CET)
        Duration: 30s, Total samples = 77.25s (257.50%)
        Showing nodes accounting for 70.31s, 91.02% of 77.25s total
        Dropped 342 nodes (cum <= 0.39s)
              flat  flat%   sum%        cum   cum%
            12.69s 16.43% 16.43%     38.33s 49.62%  runtime.scanobject
        ...
             2.57s  3.33% 52.57%     48.25s 62.46%  runtime.gcDrain
        ...
      • Memory profile
        # go tool pprof -alloc_space -top heap_profile.pb            
        File: opm
        Build ID: c34f21afc19ad938b07c9b76ac976d40e72bab9a
        Type: alloc_space
        Time: Feb 12, 2025 at 11:38am (CET)
        Showing nodes accounting for 1351884.79MB, 98.63% of 1370616.60MB total
        Dropped 334 nodes (cum <= 6853.08MB)
              flat  flat%   sum%        cum   cum%
        694944.35MB 50.70% 50.70% 694944.35MB 50.70%  encoding/json.unquoteBytes
        646676.62MB 47.18% 97.88% 1341620.97MB 97.88%  encoding/json.(*decodeState).literalStore
         8589.47MB  0.63% 98.51%  8589.47MB  0.63%  google.golang.org/protobuf/proto.MarshalOptions.marshal
        ...

         

      It seems that the opm binary running inside the pod redhat-operators is allocating a lot of memory to decode JSON values causing the golang garbage collector to use most of the CPU.

      I asked the customer to get the  bundles from the catalogsource redhat-operators by executing:

      oc run -n openshift-marketplace grpcurl -ti --rm --image=quay.io/gmeghnag/grpcurl --command -q -- grpcurl -plaintext -d '{}' redhat-operators.openshift-marketplace.svc:50051 api.Registry/ListBundles > bundles.json

      But I don't see anything weird from that bundles.json (its size is not different from the one collected from my lab cluster).

              rh-ee-jkeister Jordan Keister
              rhn-support-gmeghnag Gabriel Meghnagi
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: