-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
1.14.0
-
8
-
False
-
-
False
-
-
-
GitOps Tangerine Sprint 17, GitOps Tangerine Sprint 18, GitOps Tangerine Sprint 19
Description of Problem
When deploying many apps (e.g., app of apps), the sever pod crashed due to:
fatal error: concurrent map iteration and map write
goroutine 87034 [running]:
reflect.mapiternext(0x50aee9?)
/usr/lib/golang/src/runtime/map.go:1392 +0x13
reflect.(*MapIter).Next(0xc002c06460?)
/usr/lib/golang/src/reflect/value.go:2005 +0x74
encoding/json.mapEncoder.encode({0x15?}, 0xc0020108c0, {0x37e3760?, 0xc0001618d0?, 0xc0001618d0?}, {0x7?, 0x0?})
<<json encoding and go traces omitted >>
/remote-source/argo_cd/deps/gomod/pkg/mod/k8s.io/client-go@v0.29.6/kubernetes/typed/core/v1/secret.go:136 +0x166
github.com/argoproj/argo-cd/v2/util/db.(*secretsRepositoryBackend).UpdateRepository(0xc002c06f28, {0x558e3c8, 0xc001ef2990}, 0xc001ea2000)
/remote-source/argo_cd/app/util/db/repository_secrets.go:146 +0x173
github.com/argoproj/argo-cd/v2/util/db.(*db).UpdateRepository(0xc0019a0570, {0x558e3c8, 0xc001ef2990}, 0xc001ea2000)
/remote-source/argo_cd/app/util/db/repository.go:185 +0x17a
github.com/argoproj/argo-cd/v2/server/repository.(*Server).UpdateRepository(0xc0001dc230, {0x558e3c8, 0xc001ef2990}, 0xc001ef20c0)
/remote-source/argo_cd/app/server/repository/repository.go:469 +0x33c
github.com/argoproj/argo-cd/v2/server/repository.(*Server).Update(0x77a6260?, {0x558e3c8?, 0xc001ef2990?}, 0xc002c8c076?)
/remote-source/argo_cd/app/server/repository/repository.go:447 +0x1d
github.com/argoproj/argo-cd/v2/pkg/apiclient/repository._RepositoryService_Update_Handler.func1({0x558e3c8?, 0xc001ef2990?}, {0x3d1c1a0?, 0xc001ef20c0?})
/remote-source/argo_cd/app/pkg/apiclient/repository/repository.pb.go:1246 +0xcb
Another variant of the error log:
2025-06-05T05:32:34.710926779Z fatal error: concurrent map writes
2025-06-05T05:32:34.714105736Z
2025-06-05T05:32:34.714105736Z goroutine 149882 [running]:
2025-06-05T05:32:34.714127142Z github.com/argoproj/argo-cd/v2/util/db.updateSecretString(...)
2025-06-05T05:32:34.714127142Z /remote-source/argo_cd/app/util/db/secrets.go:76
2025-06-05T05:32:34.714133416Z github.com/argoproj/argo-cd/v2/util/db.repositoryToSecret(0xc002f5f9e0, 0xc003517cc0)
2025-06-05T05:32:34.714138530Z /remote-source/argo_cd/app/util/db/repository_secrets.go:369 +0x130
2025-06-05T05:32:34.714143334Z github.com/argoproj/argo-cd/v2/util/db.(*secretsRepositoryBackend).UpdateRepository(0xc002e4cf28, {0x558e3c8, 0xc00372cb10}, 0xc002f5f9e0)
2025-06-05T05:32:34.714153118Z /remote-source/argo_cd/app/util/db/repository_secrets.go:144 +0xe8
2025-06-05T05:32:34.714158498Z github.com/argoproj/argo-cd/v2/util/db.(*db).UpdateRepository(0xc0015f3950, {0x558e3c8, 0xc00372cb10}, 0xc002f5f9e0)
2025-06-05T05:32:34.714168371Z /remote-source/argo_cd/app/util/db/repository.go:185 +0x17a
2025-06-05T05:32:34.714191867Z github.com/argoproj/argo-cd/v2/server/repository.(*Server).UpdateRepository(0xc00045aa80, {0x558e3c8, 0xc00372cb10}, 0xc007417920)
2025-06-05T05:32:34.714197129Z /remote-source/argo_cd/app/server/repository/repository.go:469 +0x33c
2025-06-05T05:32:34.714206997Z github.com/argoproj/argo-cd/v2/server/repository.(*Server).Update(0x77a6260?, {0x558e3c8?, 0xc00372cb10?}, 0xc00b199066?)
2025-06-05T05:32:34.714216690Z /remote-source/argo_cd/app/server/repository/repository.go:447 +0x1d
2025-06-05T05:32:34.714226312Z github.com/argoproj/argo-cd/v2/pkg/apiclient/repository._RepositoryService_Update_Handler.func1({0x558e3c8?, 0xc00372cb10?}, {0x3d1c1a0?, 0xc007417920?})
2025-06-05T05:32:34.714242081Z /remote-source/argo_cd/app/pkg/apiclient/repository/repository.pb.go:1246 +0xcb
In both cases, the error is from repository_secrets.go UpdateRepository func:
https://github.com/argoproj/argo-cd/blob/master/util/db/repository_secrets.go#L145-L147
145 s.repositoryToSecret(repository, repositorySecret)
146
147 _, err = s.db.kubeclientset.CoreV1().Secrets(s.db.ns).Update(ctx, repositorySecret, metav1.UpdateOptions{})
Additional Info
Problem Reproduction
Reproducibility
- <Always/Intermittent/Only Once>
Prerequisites/Environment
- <OpenShift, managed service (e.g., ROSA, ARO), operators, layered product, and other software versions, build details>
Steps to Reproduce
- ...
Expected Results
- ...
Actual Results
- ...
Problem Analysis
- <Completed by engineering team as part of the triage/refinement process>
Root Cause
- <What is the root cause of the problem? Or, why is it not a bug?>
Workaround (If Possible)
- <Are there any workarounds we can provide to the customers?>
Fix Approaches
- <If we decide to fix this bug, how will we do it?>
Acceptance Criteria
- ...
Definition of Done
- Code Complete:
- All code has been written, reviewed, and approved.
- Tested:
- Unit tests have been written and passed.
- Ensure code coverage is not reduced with the changes.
- Integration tests have been automated.
- System tests have been conducted, and all critical bugs have been fixed.
- Tested and merged on OpenShift either upstream or downstream on a local build.
- Documentation:
- User documentation or release notes have been written (if applicable).
- Build:
- Code has been successfully built and integrated into the main repository / project.
- Midstream changes (if applicable) are done, reviewed, approved and merged.
- Review:
- Code has been peer-reviewed and meets coding standards.
- All acceptance criteria defined in the user story have been met.
- Tested by reviewer on OpenShift.
- Deployment:
- The feature has been deployed on OpenShift cluster for testing.