-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
-
Background
NHC supports a list of node UnhealthyConditions as part of the nhc CR, where these conditions are evaluated with a logical "OR" to decide on the creation of a remediation CR for that node.
Therefore, NHC would create a remediation CR creation once one of the UnhealthyConditions is met, and an admin can not control or evaluate the CR creation based on multiple conditions that are met for considering a node as unhealthy.
Suggestion
Provide a stronger decision-making process for creating a remediation CR, which is based on multiple node conditions.
Each condition would have a weight, and NHC will create a remediation CR if the sum of these conditions is greater than a certain value. The weight will be used only if the condition is met and expired it's duration.
For example, a user might not want to remediate a node if the Ready condition value is "False" or "Unknown" (network is down) while there are running workloads with storage that are still healthy. Using weights, an admin could set up weights for multiple conditions and change the decision of creating a remediation to be based on their sum instead of the appearance of the first one.