-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
remediation-with-baremetal-events
-
Product / Portfolio Work
-
False
-
-
False
-
In Progress
-
VIRTSTRAT-77 - Fencing: Additional out-of-band health checks for faster remediation
-
-
11% To Do, 0% In Progress, 89% Done
K8s / OCP needs some time to detect unhealthy nodes [1], which adds a significant delay until remediation starts. Decreasing the relevant timeouts and intervals potentially introduces risks and isn't wanted [2]
An alternative could be to use the Baremetal Event Relay [3] to watch for relevant Redfish events. For converting those events into node conditions for NHC, knative [4] might be useful.
[1] https://docs.google.com/document/d/1NKZBTu4UFaCR-tJgkrPR7-DHLabe4MtSL1L8Qfws0sU/edit#heading=h.x3fr6xu7zkbj
[2] https://issues.redhat.com/browse/RFE-3727
[3] https://docs.openshift.com/container-platform/4.13/monitoring/using-rfhe.html
[4] https://docs.openshift.com/serverless/1.29/about/about-knative-eventing.html
- is related to
-
VIRTSTRAT-77 Fencing: Additional out-of-band health checks for faster remediation
-
- In Progress
-