Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-1338

Enhance how pods with RWO PVs react when the client node crashes

XMLWordPrintable

    • False
    • False
    • Undefined

      1. Proposed title of this feature request

      Enhance how pods with RWO PVs react when the client node crashes

      2. What is the nature and description of the request?

      In the scenario of running a pod using an RWO volume, if the client host where the pod is running crashes (power outage, NIC down, ...) and the kubelet service in the broken node is not accessible, the pods are left in Terminating state and it is not possible to create new pods as the volumeattachment persists. We know this is the expected behaviour in Kubernetes to avoid data corruption, but this implies a manual procedure must be executed to release the volumeattachment.

      We would like to create a Request For Enhancement to create an automatic procedure using self-fencing capabilities in Ceph to avoid the current need of manually releasing the volumeattachment. The current manual procedure is:

      The current manual procedure is:

      -After 5 minutes OpenShift detects the node is notReady and moves the workloads to another node.
      -After 20 minutes, old pods are in state Terminating and new pods are in state ContainerCreating.
      -As new pods are not able to start due to the fact that they are unable to attach or mount volumes, we need to manually perform the following procedure:
      -Initial scenario:
      $ oc get pods
      NAME STATUS AGE
      rbd-write-workload-generator-6c4d87b4c4-kbrlw ContainerCreating 20s
      rbd-write-workload-generator-6c4d87b4c4-mjxwb Terminating 15m
      -Examine the error we are getting:
      $ oc get pods rbd-write-workload-generator-6c4d87b4c4-kbrlw -o yaml
      ...
      Multi-Attach error for volume "pvc-a3f569a7-1fe7-4d2d-b561-090b2426b13d"
      Volume is already used by pod(s) rbd-write-workload-generator-6c4d87b4c4-mjxwb
      ...
      -Set environment variables:
      $ OLD_POD_NAME="rbd-write-workload-generator-6c4d87b4c4-mjxwb"
      $ NEW_POD_NAME="rbd-write-workload-generator-6c4d87b4c4-kbrlw"
      $ PVC_NAME="pvc-a3f569a7-1fe7-4d2d-b561-090b2426b13d"
      -Delete the pod in Terminating state:
      $ oc delete pod ${OLD_POD_NAME} --force --grace-period=0
      -Get the VolumeAttachment linked to the Persistent Volume mounted by the pod:
      $ VOL_ATTACHMENT_NAME=$(oc get volumeattachment -o jsonpath="{.items[?(@.spec.source.persistentVolumeName=='${PVC_NAME}')].metadata.name}")
      -Delete the VolumeAttachment object:
      $ oc delete volumeattachment ${VOL_ATTACHMENT_NAME}
      -Delete the pod in ContainerCreating status to force recreation:
      $ oc delete pod ${NEW_POD_NAME}
      -Wait until the new pod is created:
      $ oc get pods
      NAME STATUS AGE
      rbd-write-workload-generator-6c4d87b4c4-vlrx8 Running 34s

      3. Why does the customer need this? (List the business requirements here)
      We are ok with the fact of having a downtime of 5 minutes until OpenShift detects the pod is down and creates a new pod. But the need of executing a manual procedure to solve this situation is not desired and should be self-managed directly by OCP/K8S.

              rh-gs-gcharot Gregory Charot
              rh-ee-vhernand Victor Hernando
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: