We recently discovered(GITOPS-5665) that the operator consumes significantly more memory on clusters with a large number of secrets/configmaps. For example, on a test cluster with 2000 secrets and 2000 configmaps spread across 100 namespaces, the operator manager pod consumed over 2GB at peak with just one ArgoCD CR instance.
The operator watches many resources(code). Use this spike to perform a feasibility check on optimizing these watches. Additionally, explore other potential optimizations in the reconciliation or other parts of the code, to reduce memory consumption.
Here are some resources to get started:
- https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/
- https://github.com/operator-framework/operator-sdk/issues/5382
- https://github.com/kubernetes-sigs/controller-runtime/issues/1884
- https://github.com/kubernetes-sigs/controller-runtime/issues/540
- https://sdk.operatorframework.io/docs/best-practices/common-recommendation/
- https://sdk.operatorframework.io/docs/best-practices/best-practices/
Steps to reproduce the problem:
1. Create 100 namespaces
for i in {1..100}; do oc create ns argo-test$i; done
2. Create a large key, value environment file to be used for creating config map and secret.
rm -f /tmp/environment.txt touch /tmp/environment.txt for i in {1..10000}; do echo "key${i}=value${i}" >> /tmp/environment.txt; done
3. Create 20 config maps per namespace (total 20 * 100 = 2000) using the environment file created in step 2.
for i in {1..100}; do for j in {1..20}; do oc create cm argo-cm$j --from-file /tmp/environment.txt -n argo-test$i; done; done;
4. Create 20 secrets per namespace (total 20 * 100 = 2000) using the environment file created in step 2
for i in {1..100}; do for j in {1..20}; do oc create secret generic argo-secret$j --from-file /tmp/environment.txt -n argo-test$i; done; done;
5. Try installing OpenShift GitOps 1.14.0 now and you should be able to see the POD OOM killed error
openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 Running 5 (85s ago) 5m openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 2/2 Running 5 (96s ago) 5m11s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 OOMKilled 5 (109s ago) 5m24s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 CrashLoopBackOff 5 (8s ago) 5m31s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 Running 6 (2m48s ago) 8m11s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 2/2 Running 6 (2m58s ago) 8m21s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 OOMKilled 6 (3m12s ago) 8m35s openshift-gitops-operator-controller-manager-b6d7788d9-pf5z6 1/2 CrashLoopBackOff 6 (7s ago) 8m41s
- split from
-
GITOPS-5665 High memory utilization of `openshift-gitops-operator-controller-manager` pod Post upgrade to Gitops operator v1.14.0
- Closed