Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-70365

vsphere-problem-detector-operator crash due to concurrent map writes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z
    • Storage
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

       The vsphere-problem-detector-operator pod crashed and restarted with a fatal error: concurrent map writes. This appears to be a race condition in the SetCbtData function when multiple nodes are being scanned for Changed Block Tracking (CBT) status simultaneously.

      Version-Release number of selected component (if applicable):

          4.18.28

      Relevant Output:

      oc get pods vsphere-problem-detector-operator-5cdfbbd7d8-gmwdw
      NAME                                                 READY   STATUS    RESTARTS   AGE
      vsphere-problem-detector-operator-5cdfbbd7d8-gmwdw   1/1     Running   1          28d
      
      
      
      oc logs  vsphere-problem-detector-operator-5cdfbbd7d8-gmwdw -p
       
      2026-01-02T07:54:52.698943047Z I0102 07:54:52.698882       1 node_cbt.go:52] Property not found for node cb-w1.cbdr.iiabank.com.jo
      
      2026-01-02T07:54:52.698943047Z fatal error: concurrent map writes
      2026-01-02T07:54:52.698943047Z I0102 07:54:52.698889       1 vsphere_check.go:321] CollectNodeCBT:cb-w3.cbdr.iiabank.com.jo passed
      2026-01-02T07:54:52.702159386Z
      2026-01-02T07:54:52.702159386Z goroutine 1610214 [running]:
      2026-01-02T07:54:52.702206787Z github.com/openshift/vsphere-problem-detector/pkg/util.(*ClusterInfo).SetCbtData(0xc000f808e0?, {0x2e99d0e?, 0x1?})
      2026-01-02T07:54:52.702206787Z  github.com/openshift/vsphere-problem-detector/pkg/util/cluster_info.go:151 +0xa5
      2026-01-02T07:54:52.702206787Z github.com/openshift/vsphere-problem-detector/pkg/check.(*CollectNodeCBT).CheckNode(0xc00127f040?, 0xc0014ae780, 0xc000d5f508, 0xc000c42008)
      2026-01-02T07:54:52.702206787Z  github.com/openshift/vsphere-problem-detector/pkg/check/node_cbt.go:53 +0x3d8
      2026-01-02T07:54:52.702206787Z github.com/openshift/vsphere-problem-detector/pkg/operator.runSingleNodeSingleCheck(0xc0014ae780, 0xc0016f1840, 0xc000d5f508, 0xc000c42008, {0x336dc50, 0x4daf080})
      
       
      

       

      Actual results:

          The Go runtime detects unsafe concurrent map access and terminates the process, leading to a pod restart.

      Expected results:

      The pod should not crash with the unsafe concurrent map.

      Additional info:

          Will Upload the must-gather and share the details soon

              rh-ee-rhrmo Richard Hrmo
              rhn-support-dpateriy Divyam Pateriya
              None
              None
              Shang Gao Shang Gao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: