XMLWordPrintable

    • Address SBR Design Gaps
    • False
    • Hide

      None

      Show
      None
    • False
    • To Do
    • RHWA-214 - SBR Operator
    • 50% To Do, 50% In Progress, 0% Done

      Summary This epic tracks the architectural refactoring of Storage Based Remediation (SBR) to decouple detection logic from remediation execution and resolve circular dependencies during node recovery. Currently, the design creates race conditions with Node Health Check (NHC) and prevents fenced nodes from verifying their health due to persistent taints that block necessary storage workloads from running.

      The new architecture splits the remediation flow: healthy peers will now report storage failures via Node Conditions rather than triggering fencing directly, allowing NHC to arbitrate the decision. Additionally, the post-remediation workflow is updated to remove the remediation resource (and its associated taint) immediately after fencing, utilizing a grace period to allow storage verification mechanisms to confirm node recovery without triggering recursive fencing loops.

      Here is a link for the detailed design that would be implemented in this epic

              mshitrit@redhat.com Michael Shitrit
              mshitrit@redhat.com Michael Shitrit
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: