User Story
As a user we want MCD to handle restarting the kubelet for the node to rejoin the cluster when user removes a node intentionally or unintentionally using delete operation in SNO in addition to preventing the user for doing so: https://github.com/kubernetes/enhancements/issues/2775.
Background
The node deletion operation in a Single Node OpenShift cluster using the client - $ oc delete node <node> leads to some of the cluster operators and application failures as there is there won't be any nodes for them to get scheduled on. This will impact the uptime of the application as well as well as a couple of system components in cases where this operation is run by the user intentionally or unintentionally. One way to recover is by restarting the kubelet on the node for it to get registered back to the cluster which in turn will enable the pods to get scheduled. While https://github.com/kubernetes/enhancements/issues/2775 will help preventing this operation from happening in the future, we need alert the user and MCD needs to understand the signal and handle the kubelet restart operation to avoid downtime if possible like we discussed in https://coreos.slack.com/archives/C018KQE33MF/p1626200735229900. Logs including the cluster operator status: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/sno/node-deletion/.
Stakeholders
- Single Node OpenShift and Chaos Engineering Teams: https://coreos.slack.com/archives/C018KQE33MF/p1626200735229900