-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
In a use case where escalation remediation is set, where FAR is the first remediator, MDR is the second remediator, and FAR didn't manage to remediate the host in time:
- On node failure, NHC creates a FAR remediation CR
- FAR fails to remediate within the allocated time (use a minimal timeout (60s) or wrong CR so that fence agent command has failed or FAR has completed the remediation (CR condition succeeded=true), but the node is not back to Ready=true).
- NHC triggers MDR remediation CR (with a graceful timeout of 600s so that MDR will succeed, or a minimal timeout of 60s, so that MDR will be timed out)
- MDR remediates by deleting the machine, which will remove the node and provision a new one
- FAR remediation CR, which belongs to the old node, hasn't removed.
Will the FAR remediation CR be removed eventually if the Node has been provisioned with a new name (or the same name)?*
MDR could be timed out, but the machine deletion has already been triggered