Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: openshift-4.11, openshift-4.12, openshift-4.13, openshift-4.14
Component/s: MCO
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Score:
PX Priority Data:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request
"Control node failure handling"

2. What is the nature and description of the request?
It should be possible to avoid user intervention to recover a crashed control plane node ( In one of the customer trails we observed kernel crash of a node due to storage driver where a STS POD is stuck and due to K8S native behavior of not relocating STS, user intervention was needed to reboot the node manually. This resulted in application outage for 1 day to recognize and recover ). it is observed that those STS POD's are moved to other healthy node only after node reboot. ( Here basically K8S cplane has lost connectivity kubelet ).

3. Why does the customer need this? (List the business requirements here)
We are using Master node as schedulable ( workloads include STS PODS ). we are already utilising selfnoderemedation operator to address worker node failure. Similarly we need a solution in case STS PODS running on master node.

4. List any affected packages or components. - Not sure on affected packages

Assignee:: Mark Russell

Reporter:: Ankita Kanekar

Need Info From:: Mark Russell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/01/08 2:50 PM

Updated:: 2024/10/21 4:59 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates