-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
As a developer of OCPBUGS, I need:
- To ensure that we try to restore the fencing devices after they've failed on loop.
- This involves running the `pcs stonith cleanup` command on the nodes if we discover that the fence device will no longer be called OR preventing pacemaker from giving up entirely.
Acceptance Criteria
- We have a mechanism merged into cluster-etcd-operator that tries to cleanup the fencing resources after pulling their status (if they are marked as infinite failures) or we update the fencing resource to prevent pacemaker from giving up entirely.
- The operator is updated to mark itself degraded if we are in this state.
Supporting Documents
Issue synthesized with help from gemini Engineering Jira Buddy gem