Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-697

FAR CR isn't removed when node is deleted by MDR

XMLWordPrintable

    • Moderate

      In a use case where escalation remediation is set, where FAR is the first remediator, MDR is the second remediator, and FAR didn't manage to remediate the host in time:

      • On node failure, NHC creates a FAR remediation CR
      • FAR fails to remediate within the allocated time (use a minimal timeout (60s) or wrong CR so that fence agent command has failed or FAR has completed the remediation (CR condition succeeded=true), but the node is not back to Ready=true).
      • NHC triggers MDR remediation CR (with a graceful timeout of 600s so that MDR will succeed, or a minimal timeout of 60s, so that MDR will be timed out)
      • MDR remediates by deleting the machine, which will remove the node and provision a new one
      • FAR remediation CR, which belongs to the old node, hasn't removed.

      Will the FAR remediation CR be removed eventually if the Node has been provisioned with a new name (or the same name)?*
      MDR could be timed out, but the machine deletion has already been triggered

              Unassigned Unassigned
              mshitrit@redhat.com Michael Shitrit
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: