Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Node Health Check Operator
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Improve bare metal nodes failure detection

2. What is the nature and description of the request?

Currently when using Node Health Check Operator it's possible to trigger remediation when there's kubernetes API connectivity failure or when the node crashes. However, there are several other failure scenarios that should be considered, mainly for the OpenShift Virtualization use case. Some examples, considering API connectivity is available:

Secondary networks not communicating because of link failure.
Access to storage is lost (SAN, NAS or SDS).

3. Why does the customer need this? (List the business requirements here)

Improve virtual machine based workloads availability, preventing them to continuously run on a node that isn't fully functional despite being "Ready" from kubernetes API standpoint.

4. List any affected packages or components.

Node Health Check Operator ?

5. Previous work

Node Problem Detector seems to cover the use cases above, but it isn't a supported component in OpenShift.

Assignee:: Ramon Acedo

Reporter:: Vagner Farias

Need Info From:: None

Votes:: 9 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: 2025/06/23 2:00 PM

Updated:: 2025/09/15 1:27 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates