Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-26418

multicluster-global-hub-agent crashed on "fatal error: concurrent map read and map write"

XMLWordPrintable

    • GH Train-34
    • Important
    • Approved
    • None

      Description of problem:

      ring a large scale test combining ACM, MCGH, AAP configured with EDA to provision, manage 3500+ SNOs and apply the DU profile via ACM Policy and TALM, and lastly run a day2 ansible playbook initiated via event driven ansible, the multicluster-global-hub agent crashed twice which resulted in dropping events for 144 SNOs which would prevent those clusters from having Ansible run a playbook with those clusters.

       

      # oc get po -n multicluster-global-hub multicluster-global-hub-agent-66f9765d79-6llg6
      NAME                                             READY   STATUS    RESTARTS      AGE
      multicluster-global-hub-agent-66f9765d79-6llg6   1/1     Running   2 (10h ago)   14h

      In the logs, the error appears to be:

       

      2025-11-18T04:54:52.605Z	INFO	generic/periodic_syncer.go:98	resynced 3001 objects for event type: managedcluster
      2025-11-18T04:54:52.605Z	INFO	generic/periodic_syncer.go:118	resyncing event type: event.managedcluster
      2025-11-18T04:55:20.112Z	INFO	controller/controller.go:217	Starting workers	{"controller": "event", "controllerGroup": "", "controllerKind": "Event", "worker count": 1}
      2025-11-18T04:55:20.112Z	INFO	controller/controller.go:217	Starting workers	{"controller": "clusterversion", "controllerGroup": "config.openshift.io", "controllerKind": "ClusterVersion", "worker count": 1}
      2025-11-18T04:55:20.112Z	INFO	controller/controller.go:217	Starting workers	{"controller": "policy", "controllerGroup": "policy.open-cluster-management.io", "controllerKind": "Policy", "worker count": 1}
      fatal error: concurrent map read and map write

       

       

       

      Version-Release number of selected component (if applicable):

      OCP -  4.20.2

      Deployed OCP - 4.20.2

      ACM - 2.15.0-DOWNSTREAM-2025-10-29-01-15-32

      Global Hub - multicluster-global-hub-operator-rh.v1.6.0 ( quay.io/redhat-user-workloads/acm-multicluster-glo-tenant/multicluster-global-hub-operator-catalog-v419-globalhub-1-6@sha256:688203524b81a296df82d8f9b0f73964911fd1810d04c38d7a575de0420a47e2 )

      AAP - aap-operator.v2.6.0-0.1762261209

      How reproducible:

      Rare, this occurred once in 7 large tests

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

              clyang82 Chunlin Yang
              akrzos@redhat.com Alex Krzos
              Yaheng Liu Yaheng Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: