Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-426

NHC is stuck at loop and does not do the remediation if MDR is first in remediationTemplate

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhwa-25.9
    • Node Healthcheck
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      NHC is stuck at loop and does not do the remediation if MDR is first in remediationTemplate

      I created a NHC config where MDR is placed before SNR with help of remediationTemplate.

      Now once MDR is timeout and failed to remediate then SNR will trigger and it all stuck for more then 11, 12 hr and failed to remediate the nodes. I was able to delete MDR CR and SNR CR in the mid of remediation but then I was not even able to delete NHC.

      I created a scenario with SNR as first remediator then MDR as second and things are good till here.

      But now I set the MDR as first remediator and SNR as second now I can see MDR is timout and I can no more see the node back in ready state by any of the remediator.

      Now I waited for more then 12 hrs it's still same so I tried deleting the SNR, MDR and NHC CR.

      but now I am not able to delete NHC CR it says:
      Error from server (Forbidden): admission webhook "vnodehealthcheck.kb.io" denied the request: deletion prohibited due to running remediation

      Adding the logs for the details:
      mdr-snr-nhc-logs.text

              Unassigned Unassigned
              vipikuma vipin kumar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: