Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: 4.16.0
Affects Version/s: 4.15
Component/s: Storage / Operators
Labels:
None

Severity:
Critical
Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.16.0
Target Backport Versions:

4.15.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic.
(Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/)


The storage CO reports "unable to find VM by UUID":
  - lastTransitionTime: "2023-12-13T09:15:27Z"
    message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable:
      unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable:
      Waiting for Deployment"
    reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying
    status: "False"
    type: Available
(But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather)


The vsphere-problem-detector-operator log:
2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159       1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com
2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108       1 vsphere_check.go:271] CountVolumeTypes passed
2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258       1 zones.go:124] Checking tags for multi-zone support.
2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433       1 zones.go:202] No FailureDomains configured.  Skipping check.
2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487       1 vsphere_check.go:271] CheckZoneTags passed
2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522       1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221
2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555       1 vsphere_check.go:271] ClusterInfo passed
2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594       1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions
2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference
2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb]
2023-12-13T10:10:56.669565413Z 
2023-12-13T10:10:56.669591144Z goroutine 550 [running]:
2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80)
2023-12-13T10:10:56.669991749Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb
2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1()
2023-12-13T10:10:56.670289644Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55
2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?)
2023-12-13T10:10:56.670702592Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55
2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?)
2023-12-13T10:10:56.671331852Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7
2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1()
2023-12-13T10:10:56.671589925Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25
2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool
2023-12-13T10:10:56.671847478Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Steps to Reproduce:

    1. See description
    2.
    3.

Actual results:

   vpd is panic

Expected results:

   vpd should not panic

Additional info:

   I guess it is privileges issue, but our pod should not be panic.

blocks

OCPBUGS-25648 vsphere-problem-detector-operator pod CrashLoopBackOff with panic

Closed

is cloned by

OCPBUGS-25648 vsphere-problem-detector-operator pod CrashLoopBackOff with panic

Closed

links to

openshift/vsphere-problem-detector#148: OCPBUGS-25372: Fix VPD crash when vm is not found

RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Hemant Kumar

Reporter:: Wei Duan

QA Contact:: Wei Duan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/12/14 7:50 AM

Updated:: 2024/06/27 11:25 AM

Resolved:: 2024/06/27 11:25 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates