Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-7086

Server crashed with fatal error: concurrent map iteration and map write

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • GitOps Tangerine Sprint 17, GitOps Tangerine Sprint 18, GitOps Tangerine Sprint 19

      Description of Problem

      When deploying many apps (e.g., app of apps), the sever pod crashed due to:

      fatal error: concurrent map iteration and map write
      goroutine 87034 [running]:
      reflect.mapiternext(0x50aee9?)
              /usr/lib/golang/src/runtime/map.go:1392 +0x13
      reflect.(*MapIter).Next(0xc002c06460?)
              /usr/lib/golang/src/reflect/value.go:2005 +0x74
      encoding/json.mapEncoder.encode({0x15?}, 0xc0020108c0, {0x37e3760?, 0xc0001618d0?, 0xc0001618d0?}, {0x7?, 0x0?})
      
      <<json encoding and go traces omitted >>
              /remote-source/argo_cd/deps/gomod/pkg/mod/k8s.io/client-go@v0.29.6/kubernetes/typed/core/v1/secret.go:136 +0x166
      github.com/argoproj/argo-cd/v2/util/db.(*secretsRepositoryBackend).UpdateRepository(0xc002c06f28, {0x558e3c8, 0xc001ef2990}, 0xc001ea2000)
      
              /remote-source/argo_cd/app/util/db/repository_secrets.go:146 +0x173
      github.com/argoproj/argo-cd/v2/util/db.(*db).UpdateRepository(0xc0019a0570, {0x558e3c8, 0xc001ef2990}, 0xc001ea2000)
              /remote-source/argo_cd/app/util/db/repository.go:185 +0x17a
      github.com/argoproj/argo-cd/v2/server/repository.(*Server).UpdateRepository(0xc0001dc230, {0x558e3c8, 0xc001ef2990}, 0xc001ef20c0)
              /remote-source/argo_cd/app/server/repository/repository.go:469 +0x33c
      github.com/argoproj/argo-cd/v2/server/repository.(*Server).Update(0x77a6260?, {0x558e3c8?, 0xc001ef2990?}, 0xc002c8c076?)
              /remote-source/argo_cd/app/server/repository/repository.go:447 +0x1d
      github.com/argoproj/argo-cd/v2/pkg/apiclient/repository._RepositoryService_Update_Handler.func1({0x558e3c8?, 0xc001ef2990?}, {0x3d1c1a0?, 0xc001ef20c0?})
              /remote-source/argo_cd/app/pkg/apiclient/repository/repository.pb.go:1246 +0xcb
      

      Another variant of the error log:

      2025-06-05T05:32:34.710926779Z fatal error: concurrent map writes
      2025-06-05T05:32:34.714105736Z
      2025-06-05T05:32:34.714105736Z goroutine 149882 [running]:
      2025-06-05T05:32:34.714127142Z github.com/argoproj/argo-cd/v2/util/db.updateSecretString(...)
      2025-06-05T05:32:34.714127142Z  /remote-source/argo_cd/app/util/db/secrets.go:76
      2025-06-05T05:32:34.714133416Z github.com/argoproj/argo-cd/v2/util/db.repositoryToSecret(0xc002f5f9e0, 0xc003517cc0)
      2025-06-05T05:32:34.714138530Z  /remote-source/argo_cd/app/util/db/repository_secrets.go:369 +0x130
      2025-06-05T05:32:34.714143334Z github.com/argoproj/argo-cd/v2/util/db.(*secretsRepositoryBackend).UpdateRepository(0xc002e4cf28, {0x558e3c8, 0xc00372cb10}, 0xc002f5f9e0)
      2025-06-05T05:32:34.714153118Z  /remote-source/argo_cd/app/util/db/repository_secrets.go:144 +0xe8
      2025-06-05T05:32:34.714158498Z github.com/argoproj/argo-cd/v2/util/db.(*db).UpdateRepository(0xc0015f3950, {0x558e3c8, 0xc00372cb10}, 0xc002f5f9e0)
      2025-06-05T05:32:34.714168371Z  /remote-source/argo_cd/app/util/db/repository.go:185 +0x17a
      2025-06-05T05:32:34.714191867Z github.com/argoproj/argo-cd/v2/server/repository.(*Server).UpdateRepository(0xc00045aa80, {0x558e3c8, 0xc00372cb10}, 0xc007417920)
      2025-06-05T05:32:34.714197129Z  /remote-source/argo_cd/app/server/repository/repository.go:469 +0x33c
      2025-06-05T05:32:34.714206997Z github.com/argoproj/argo-cd/v2/server/repository.(*Server).Update(0x77a6260?, {0x558e3c8?, 0xc00372cb10?}, 0xc00b199066?)
      2025-06-05T05:32:34.714216690Z  /remote-source/argo_cd/app/server/repository/repository.go:447 +0x1d
      2025-06-05T05:32:34.714226312Z github.com/argoproj/argo-cd/v2/pkg/apiclient/repository._RepositoryService_Update_Handler.func1({0x558e3c8?, 0xc00372cb10?}, {0x3d1c1a0?, 0xc007417920?})
      2025-06-05T05:32:34.714242081Z  /remote-source/argo_cd/app/pkg/apiclient/repository/repository.pb.go:1246 +0xcb
      

      In both cases, the error is from repository_secrets.go UpdateRepository func:
      https://github.com/argoproj/argo-cd/blob/master/util/db/repository_secrets.go#L145-L147

      145	s.repositoryToSecret(repository, repositorySecret)
      146
      147	_, err = s.db.kubeclientset.CoreV1().Secrets(s.db.ns).Update(ctx, repositorySecret, metav1.UpdateOptions{})
      
      

      Additional Info

      Problem Reproduction

      Reproducibility

      • <Always/Intermittent/Only Once>

      Prerequisites/Environment

      • <OpenShift, managed service (e.g., ROSA, ARO), operators, layered product, and other software versions, build details>

      Steps to Reproduce

      • ...

      Expected Results

      • ...

      Actual Results

      • ...

      Problem Analysis

      • <Completed by engineering team as part of the triage/refinement process>

      Root Cause

      • <What is the root cause of the problem? Or, why is it not a bug?>

      Workaround (If Possible)

      • <Are there any workarounds we can provide to the customers?>

      Fix Approaches

      • <If we decide to fix this bug, how will we do it?>

      Acceptance Criteria

      • ...

      Definition of Done

      • Code Complete:
        • All code has been written, reviewed, and approved.
      • Tested:
        • Unit tests have been written and passed.
        • Ensure code coverage is not reduced with the changes.
        • Integration tests have been automated.
        • System tests have been conducted, and all critical bugs have been fixed.
        • Tested and merged on OpenShift either upstream or downstream on a local build.
      • Documentation:
        • User documentation or release notes have been written (if applicable).
      • Build:
        • Code has been successfully built and integrated into the main repository / project.
        • Midstream changes (if applicable) are done, reviewed, approved and merged.
      • Review:
        • Code has been peer-reviewed and meets coding standards.
        • All acceptance criteria defined in the user story have been met.
        • Tested by reviewer on OpenShift.
      • Deployment:
        • The feature has been deployed on OpenShift cluster for testing.

              cfang@redhat.com Cheng Fang
              cfang@redhat.com Cheng Fang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: