-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
Quality / Stability / Reliability
-
False
-
False
-
CLOSED
-
CNV Infra 221
-
Important
-
None
Description of problem:
In HCI environment(CNV+ODF) to see if the automated poison pill approach could cause ODF mon quorum loss or VM outages for OCP Virt. Or has it been tested with these products already?
NHC/PP only kicks in once there has been a failure... so ODF would have already lost quorum (because the mon already failed, or was unreachable), we're trying to get it back by recovering the node. Same for the VMs... either it's already dead or unreachable, and we're giving you the chance to bring it up somewhere else.
The most common reason that fencing might "create downtime" is when admins set overly aggressive timeouts (causing NHC/PP to react to every little blip), and under spec the machines (causing the process keeping the watchdog alive to be starved of CPU).
We can't control the hardware, but we have some rules in place for configuring timeouts, and have more planned for future releases.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
- external trackers