Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Storage
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is similar to ~~OCPBUGS-13551~~ / ~~OCPBUGS-33798~~, but that issue should be resolved with OCP 4.14.37, so opening another bug.

Description of problem:

Customer performed a disaster recovery test. As part of the test, one nodepool was forcefully shut down in vCenter.

After restarting the shutdown nodes some statefulset pods wanted to start again but got stuck in an MountVolume.SetUp failed Error. 3 out of the 5 statefulset replicas got killed by the node shutdown. 1 of those 3 pods was able to start successfully after the restart of the nodes without any problems, but 2 out of the 3 restarted pods were stuck in ContainerCreating state with MountVolume.SetUp failed Error.

Event of one of the pods stuck in ContainerCreating state:

MountVolume.SetUp failed for volume "pvc-dea66279-a255-4879-9a84-2e39f05593ea" : rpc error: code = FailedPrecondition desc = volume ID: "[VSAN-EXAMPLE-OPENSHIFT] 9f581864-2871-a9ee-a6ba-9e66fb700256/_002c/962e41a3d16040a39d198677ef61f0f1.vmdk" does not appear staged to "/var/lib/kubelet/plugins/kubernetes.io/csi/csi.vsphere.vmware.com/80848e86228674b6011b910e5bf771a188cb7d242cef8cae93e3efddd155bc17/globalmount"

We were able to fix the mount error with a pod restart (waited at least 20 minutes to see if the error fixes itself, but as nothing happened we killed the pods)

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.14.37

How reproducible:

On customer side, only in one disaster recovery test

Steps to Reproduce:

    1. Run OCP 4.14 on vSphere. Forcefully stop multiple nodes where Pods of a StatefulSet are running
    2. Restart the nodes
    3. Observe Pod status

Actual results:

Not all Pods start, some fail with a MountVolume.SetUp failed Error

Expected results:

All Pods start as expected

Additional info:

- must-gather and csi-driver logs are available in Support Case

is related to

OCPBUGS-33798 FailedPrecondition volume does not appear staged

Closed

Assignee:: Jonathan Dobson

Reporter:: Simon Krenger

Need Info From:: None

Contributors:: None

QA Contact:: Wei Duan

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/11/08 12:50 PM

Updated:: 2025/09/14 12:25 AM

Resolved:: 2025/04/24 7:53 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates