-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
openshift-4.11, openshift-4.12, openshift-4.13, openshift-4.14
-
None
-
False
-
None
-
False
-
Not Selected
-
-
-
-
1. Proposed title of this feature request
"Control node failure handling"
2. What is the nature and description of the request?
It should be possible to avoid user intervention to recover a crashed control plane node ( In one of the customer trails we observed kernel crash of a node due to storage driver where a STS POD is stuck and due to K8S native behavior of not relocating STS, user intervention was needed to reboot the node manually. This resulted in application outage for 1 day to recognize and recover ). it is observed that those STS POD's are moved to other healthy node only after node reboot. ( Here basically K8S cplane has lost connectivity kubelet ).
3. Why does the customer need this? (List the business requirements here)
We are using Master node as schedulable ( workloads include STS PODS ). we are already utilising selfnoderemedation operator to address worker node failure. Similarly we need a solution in case STS PODS running on master node.
4. List any affected packages or components. - Not sure on affected packages