-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
False
-
None
-
False
-
-
Description of Problem
- I'm doing 3500 SNO ZTP Scale test with ACM 2.13 Downstream build and OCP 4.18.0-rc4. The SNOs are deployed with Assisted Installer with Siteconfig v1. Each gitops application deploys 300 clusters. 500 clusters are deployed every hous. First 8 applications work well but applications are stuck at "out of sync" staring with application number 9 as the openshift-gitops-application-controller-0 pod crashes with OOMKILLED. so only the first 2500 clusters are deployed, the rest 1172 ones are not.
we've down similar test for ACM 2.12 and OCP 4.17 where sometimes we have issue with last 100 cluster in the last app,GITOPS-5664. but now it looks get much worse with ACM 2.13/OCP 4.18.0# oc get pod -n openshift-gitops openshift-gitops-application-controller-0 -w NAME READY STATUS RESTARTS AGE openshift-gitops-application-controller-0 1/1 Running 90 (8m8s ago) 18h openshift-gitops-application-controller-0 0/1 OOMKilled 90 (8m9s ago) 18h openshift-gitops-application-controller-0 0/1 CrashLoopBackOff 90 (10s ago) 18h
The gitops and audit must-gather is here
Additional Info
- <Any additional info such as logs, must-gather outputs, etc.>
Problem Reproduction
- <How do we reproduce the problem?>
Reproducibility
- <Always/Intermittent/Only Once>
Prerequisites/Environment
- <OpenShift, managed service (e.g., ROSA, ARO), operators, layered product, and other software versions, build details>
Steps to Reproduce
- ...
Expected Results
- ...
Actual Results
- ...
Problem Analysis
- <Completed by engineering team as part of the triage/refinement process>
Root Cause
- <What is the root cause of the problem? Or, why is it not a bug?>
Workaround (If Possible)
- <Are there any workarounds we can provide to the customers?>
Fix Approaches
- <If we decide to fix this bug, how will we do it?>
Acceptance Criteria
- ...
Definition of Done
- Code Complete:
- All code has been written, reviewed, and approved.
- Tested:
- Unit tests have been written and passed.
- Ensure code coverage is not reduced with the changes.
- Integration tests have been automated.
- System tests have been conducted, and all critical bugs have been fixed.
- Tested and merged on OpenShift either upstream or downstream on a local build.
- Documentation:
- User documentation or release notes have been written (if applicable).
- Build:
- Code has been successfully built and integrated into the main repository / project.
- Midstream changes (if applicable) are done, reviewed, approved and merged.
- Review:
- Code has been peer-reviewed and meets coding standards.
- All acceptance criteria defined in the user story have been met.
- Tested by reviewer on OpenShift.
- Deployment:
- The feature has been deployed on OpenShift cluster for testing.