Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-27019

Fatal concurrent map read/write exception in klusterlet agent container

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • ACM 2.13.4
    • ACM Architecture
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • contract-priority
    • Moderate
    • None

       

      Description of problem:

      When installing a SNO managed cluster, klusterlet agent container restarts throwing

       

      I1124 08:05:56.342822       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"open-cluster-management-agent", Name:"klusterlet-agent", UID:"777f027c-7779-42f8-80c0-abbc777ce35", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deployment Updated' Updated open-cluster-management-agent-addon/config-policy-controller
      fatal error: concurrent map read and map writegoroutine 922 [running]:
      github.com/openshift/library-go/pkg/operator/resource/resourceapply.(*resourceCache).SafeToSkipApply(0xc00060a6f8, {0x3503ca8?, 0xc001baba40?}, {0x3503ca8, 0xc001babcc0})
              github.com/openshift/library-go@v0.0.0-20241107160307-0064ad7bd060/pkg/operator/resource/resourceapply/resource_cache.go:148 +0x13a
      github.com/openshift/library-go/pkg/operator/resource/resourceapply.ApplySecretImproved({0x352da48, 0xc0007144b0}, {0x7f5ac46bd310, 0xc000d8c9c0}, {0x353fb20, 0xc0008730a0}, 0xc001baba40, {0x3504b88, 0xc00060a6f8})
              github.com/openshift/library-go@v0.0.0-20241107160307-0064ad7bd060/pkg/operator/resource/resourceapply/core.go:368 +0x123
      github.com/openshift/library-go/pkg/operator/resource/resourceapply.ApplyDirectly({0x352da48, 0xc0007144b0}, 0xc001bb4cc0, {0x353fb20, 0xc0008730a0}, {0x3504b88, 0xc00060a6f8}, 0xc001bb4d40, {0xc001bb4d30, 0x1, ...})
              github.com/openshift/library-go@v0.0.0-20241107160307-0064ad7bd060/pkg/operator/resource/resourceapply/generic.go:143 +0xe7e
      open-cluster-management.io/ocm/pkg/work/spoke/apply.(*UpdateApply).Apply(0xc001604800, {0x352da48, 0xc0007144b0}, {{0x0, 0x0}, {0xc001221e4a, 0x2}, {0xc0019af180, 0x7}}, 0xc00060af68, ...)
              open-cluster-management.io/ocm/pkg/work/spoke/apply/update_apply.go:58 +0x1dc
      open-cluster-management.io/ocm/pkg/work/spoke/controllers/manifestcontroller.(*manifestworkReconciler).applyOneManifest(_, {_, _}, _, {{{_, _, _}, {_, _}}}, {{{0xc001c70248, ...}}, ...}, ...)
              open-cluster-management.io/ocm/pkg/work/spoke/controllers/manifestcontroller/manifestwork_reconciler.go:206 +0x913 

      The issue seems to be the same described in this PR on github

       

      Version-Release number of selected component (if applicable):

      Managed cluster:4.20.4

      ACM: 2.13.3

      MCE: 2.8.3

      How reproducible:

      Deploy a spoke cluster

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      klusterlet agent crashes

      Expected results:

      klusterlet agent doesn't crash

      Additional info:

      Attached to the case 

      • acm mg
      • klusterlet logs
      • open-cluster-management-agent namespace inspect

              Unassigned Unassigned
              rhn-support-ldavidde Luca Davidde
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: