Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62067

Reset fencing devices monitoring checks after they've failed on loop

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.20.z
    • Two Node Fencing
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      As a developer of OCPBUGS, I need:

      • To ensure that we try to restore the fencing devices after they've failed on loop.
      • This involves running the `pcs stonith cleanup` command on the nodes if we discover that the fence device will no longer be called OR preventing pacemaker from giving up entirely.

      Acceptance Criteria

      • We have a mechanism merged into cluster-etcd-operator that tries to cleanup the fencing resources after pulling their status (if they are marked as infinite failures) or we update the fencing resource to prevent pacemaker from giving up entirely.
      • The operator is updated to mark itself degraded if we are in this state.

      Supporting Documents

      Issue synthesized with help from gemini Engineering Jira Buddy gem

              Unassigned Unassigned
              jpoulin Jeremy Poulin
              None
              None
              Douglas Hensel Douglas Hensel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: