-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.15
-
None
Description of problem:
Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic. (Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/) The storage CO reports "unable to find VM by UUID": - lastTransitionTime: "2023-12-13T09:15:27Z" message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable: unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable: Waiting for Deployment" reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying status: "False" type: Available (But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather) The vsphere-problem-detector-operator log: 2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159 1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com 2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108 1 vsphere_check.go:271] CountVolumeTypes passed 2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258 1 zones.go:124] Checking tags for multi-zone support. 2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433 1 zones.go:202] No FailureDomains configured. Skipping check. 2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487 1 vsphere_check.go:271] CheckZoneTags passed 2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522 1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221 2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555 1 vsphere_check.go:271] ClusterInfo passed 2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594 1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions 2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference 2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb] 2023-12-13T10:10:56.669565413Z 2023-12-13T10:10:56.669591144Z goroutine 550 [running]: 2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80) 2023-12-13T10:10:56.669991749Z github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb 2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1() 2023-12-13T10:10:56.670289644Z github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55 2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?) 2023-12-13T10:10:56.670702592Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55 2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?) 2023-12-13T10:10:56.671331852Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7 2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1() 2023-12-13T10:10:56.671589925Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25 2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool 2023-12-13T10:10:56.671847478Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-12-11-033133
How reproducible:
Steps to Reproduce:
1. See description 2. 3.
Actual results:
vpd is panic
Expected results:
vpd should not panic
Additional info:
I guess it is privileges issue, but our pod should not be panic.
- blocks
-
OCPBUGS-25648 vsphere-problem-detector-operator pod CrashLoopBackOff with panic
- Closed
- is cloned by
-
OCPBUGS-25648 vsphere-problem-detector-operator pod CrashLoopBackOff with panic
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update