Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-699

FAR: Document Manual Troubleshooting/Investigating Unhealthy Node

XMLWordPrintable

      After having some concerns about supporting the off action for FAR, we have created a doc on how to address that, and we agreed that we can document how to add it manually.

       

      Add some documentation on how to troubleshoot/investigate unhealthy nodes with some manual steps.

      1. This kind of investigation can be done by raising a "flag"/hold a "lock" prior to fencing and after the investigation is over, see examples below.
      2. Add a unique taint to a node for no scheduling (e.g., medik8s.io/fence-agents-remediation-investigation=begin:NoSchedule)
      3. Cordon the node (similar to NMO functionality but without draining the node)
      4. Hold a unique lease with the node name (different than the one NHC and NMO are trying using)

      This task is for FAR but is related to other remediators as SNR.

              Unassigned Unassigned
              oraz@redhat.com Or Raz
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: