-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.20.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Intro for force storage detach: https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown/#storage-force-detach-on-timeout In drivers directly exposing LUNs, this bypasses the unstage flow where multipath -f is invoked and goes straight to unpublish which is unmapping the LUN from the per-node igroup. While it's nice to have autopilot, the consequences could be corruption on this kind of driver. We are not the first ones to get confused over this, an upstream issue: https://github.com/kubernetes/kubernetes/issues/120328 which resulted in a PR that allows disabling this behavior: https://github.com/kubernetes/kubernetes/pull/120344 This bug is about considering disabling this by default on OCP
Version-Release number of selected component (if applicable):
OCP 4.20.0
How reproducible:
100%
Steps to Reproduce:
1. Look up flag in controller manager 2. 3.
Actual results:
Enabled
Expected results:
Disabled
Additional info:
What needs to happen instead - notice the node is not ready, and then use an out-of-band means to kill the node (powering off the VM or physical node) and taint the node so that all its volumes get force-detached. This way, there is no garbage left on the node as a result of the force-detach. The crucial thing is that someone actually makes sure the node is dead and then taints it -- not a 6-minute timer expiring while the node is actually perfectly fine and simply not reachable by the API server for a brief millisecond. To invoke the forced detach mechanism: - create a pod with a volume on said driver that does nothing - exec into the node and exec 3</var/lib/kubelet/pods/pod-uid/volumeDevices/kubernetes.io~csi/pvc-pvc-uid (keep session open) - delete pod - observe errors in the driver unstaging - after 6 minutes issue a systemctl restart kubelet on the node - observe force detach log with kubectl get pods -n openshift-kube-controller-manager --no-headers | awk '{print $1}' | xargs -I {} sh -c 'kubectl logs -n openshift-kube-controller-manager {} --all-containers --prefix | grep "force detaching"'