Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.15
Affects Version/s: None
Component/s: Storage
Labels:
- Implemented
- rfe-approved-to-closed-done

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Enhance how pods with RWO PVs react when the client node crashes

2. What is the nature and description of the request?

In the scenario of running a pod using an RWO volume, if the client host where the pod is running crashes (power outage, NIC down, ...) and the kubelet service in the broken node is not accessible, the pods are left in Terminating state and it is not possible to create new pods as the volumeattachment persists. We know this is the expected behaviour in Kubernetes to avoid data corruption, but this implies a manual procedure must be executed to release the volumeattachment.

We would like to create a Request For Enhancement to create an automatic procedure using self-fencing capabilities in Ceph to avoid the current need of manually releasing the volumeattachment. The current manual procedure is:

The current manual procedure is:

-After 5 minutes OpenShift detects the node is notReady and moves the workloads to another node.
-After 20 minutes, old pods are in state Terminating and new pods are in state ContainerCreating.
-As new pods are not able to start due to the fact that they are unable to attach or mount volumes, we need to manually perform the following procedure:
-Initial scenario:
$ oc get pods
NAME STATUS AGE
rbd-write-workload-generator-6c4d87b4c4-kbrlw ContainerCreating 20s
rbd-write-workload-generator-6c4d87b4c4-mjxwb Terminating 15m
-Examine the error we are getting:
$ oc get pods rbd-write-workload-generator-6c4d87b4c4-kbrlw -o yaml
...
Multi-Attach error for volume "pvc-a3f569a7-1fe7-4d2d-b561-090b2426b13d"
Volume is already used by pod(s) rbd-write-workload-generator-6c4d87b4c4-mjxwb
...
-Set environment variables:
$ OLD_POD_NAME="rbd-write-workload-generator-6c4d87b4c4-mjxwb"
$ NEW_POD_NAME="rbd-write-workload-generator-6c4d87b4c4-kbrlw"
$ PVC_NAME="pvc-a3f569a7-1fe7-4d2d-b561-090b2426b13d"
-Delete the pod in Terminating state:
$ oc delete pod ${OLD_POD_NAME} --force --grace-period=0
-Get the VolumeAttachment linked to the Persistent Volume mounted by the pod:
$ VOL_ATTACHMENT_NAME=$(oc get volumeattachment -o jsonpath="{.items[?(@.spec.source.persistentVolumeName=='${PVC_NAME}')].metadata.name}")
-Delete the VolumeAttachment object:
$ oc delete volumeattachment ${VOL_ATTACHMENT_NAME}
-Delete the pod in ContainerCreating status to force recreation:
$ oc delete pod ${NEW_POD_NAME}
-Wait until the new pod is created:
$ oc get pods
NAME STATUS AGE
rbd-write-workload-generator-6c4d87b4c4-vlrx8 Running 34s

3. Why does the customer need this? (List the business requirements here)
We are ok with the fact of having a downtime of 5 minutes until OpenShift detects the pod is down and creates a new pod. But the need of executing a manual procedure to solve this situation is not desired and should be self-managed directly by OCP/K8S.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

RFE discussion with Node team (1).eml
13 kB
2020/11/10 12:13 PM

depends on

OCPSTRAT-724 Non-graceful node shutdown

Closed

duplicates

RFE-2235 Add non-graceful node shutdown to allow CSI drivers to dettach volumes in case of down node

Closed

Assignee:: Gregory Charot

Reporter:: Victor Hernando

Need Info From:: None

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2020/10/28 5:44 AM

Updated:: 2025/09/13 9:14 AM

Resolved:: 2022/03/17 2:23 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates