Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-5664

openshift-gitops-application-controller-0 pod crashes with OOMKILLED

XMLWordPrintable

    • 2
    • False
    • None
    • False
    • GitOps Scarlet - Sprint 7/3266
    • Important

      Description of Problem

      We've been using 32 GB memory limit for ACM 2.11 ZTP scale test. The readout link is here During ACM 2.11 test, although openshift-gitops-application-controller-0 pod restarts b/c of OOMKILLED, it still works with the 35 applications we created to deploy 300 managed clusters per argoCD application. 

      Recently, when we test ACM 2.12 with OCP 4.17.0, we hit more OOMKILLED crashes and found that last 100 clusters was not deployed b/c of the crashing pod for couple of runs.

      below is the screen shot shows that the pod even crashes with 64GB memory limit b/c it sometimes it uses more than 64GB. the pod log including previous log are attached openshift-gitops-application-controller-0-argocd-application-controller_previous.log
      Feel free to contact me if you need check in the environment, or need more information

      Additional Info

      • <Any additional info such as logs, must-gather outputs, etc.>

      Problem Reproduction

      • <How do we reproduce the problem?>

      Reproducibility

      • <Always/Intermittent/Only Once>

      Prerequisites/Environment

      • <OpenShift, managed service (e.g., ROSA, ARO), operators, layered product, and other software versions, build details>

      Steps to Reproduce

      • ...

      Expected Results

      • ...

      Actual Results

      • ...

      Problem Analysis

      • <Completed by engineering team as part of the triage/refinement process>

      Root Cause

      • <What is the root cause of the problem? Or, why is it not a bug?>

      Workaround (If Possible)

      • <Are there any workarounds we can provide to the customers?>

      Fix Approaches

      • <If we decide to fix this bug, how will we do it?>

      Acceptance Criteria

      • ...

      Definition of Done

      • Code Complete:
        • All code has been written, reviewed, and approved.
      • Tested:
        • Unit tests have been written and passed.
        • Integration tests have been completed.
        • System tests have been conducted, and all critical bugs have been fixed.
        • Tested on OpenShift either upstream or downstream on a local build.
      • Documentation:
        • User documentation or release notes have been written (if applicable).
      • Build:
        • Code has been successfully built and integrated into the main repository/project.
      • Review:
        • Code has been peer-reviewed and meets coding standards.
        • All acceptance criteria defined in the user story have been met.
        • Tested by reviewer on OpenShift.
      • Deployment:
        • The feature has been deployed on OpenShift cluster for testing.

              jgwest Jonathan West
              rhn-support-txue Ting Xue
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: