-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Improve bare metal nodes failure detection
2. What is the nature and description of the request?
Currently when using Node Health Check Operator it's possible to trigger remediation when there's kubernetes API connectivity failure or when the node crashes. However, there are several other failure scenarios that should be considered, mainly for the OpenShift Virtualization use case. Some examples, considering API connectivity is available:
- Secondary networks not communicating because of link failure.
- Access to storage is lost (SAN, NAS or SDS).
3. Why does the customer need this? (List the business requirements here)
Improve virtual machine based workloads availability, preventing them to continuously run on a node that isn't fully functional despite being "Ready" from kubernetes API standpoint.
4. List any affected packages or components.
Node Health Check Operator ?
5. Previous work
Node Problem Detector seems to cover the use cases above, but it isn't a supported component in OpenShift.