Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-11261

clusterlifecycle-state-metrics-v2 OOM with many managedclusters

XMLWordPrintable

    • 3
    • False
    • None
    • False
    • No
    • SF Ready/Refined Backlog, SF Train-14 2024-02

      Description of problem:

      After deploying 3663 managedclusters the following pod was crashlooping on OOM:

      # oc get po -n multicluster-engine clusterlifecycle-state-metrics-v2-549c9cdb5-f4662
      NAME                                                READY   STATUS    RESTARTS       AGE
      clusterlifecycle-state-metrics-v2-549c9cdb5-f4662   1/1     Running   112 (8h ago)   40h
      # oc describe po -n multicluster-engine clusterlifecycle-state-metrics-v2-549c9cdb5-f4662
      ...
          State:          Running
            Started:      Mon, 22 Apr 2024 13:20:16 +0000
          Last State:     Terminated
            Reason:       Error
            Exit Code:    137
            Started:      Mon, 22 Apr 2024 13:13:46 +0000
            Finished:     Mon, 22 Apr 2024 13:15:08 +0000
          Ready:          True
          Restart Count:  112
          Limits:
            cpu:     500m
            memory:  2Gi
          Requests:
            cpu:      25m
            memory:   32Mi
      ...

      Version-Release number of selected component (if applicable):

      ACM - 2.10.2-DOWNSTREAM-2024-04-12-21-01-39

      Hub OCP - 4.15.9

      Managed spoke clusters OCP - 4.14.14

      How reproducible:

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

            qhao@redhat.com Qing Hao
            akrzos@redhat.com Alex Krzos
            Hui Chen Hui Chen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: