-
Epic
-
Resolution: Done
-
Critical
-
None
-
compact-cluster-operations
-
False
-
-
False
-
-
Yellow
-
To Do
-
CNV-25892 - Fencing - Compact & FAR
-
-
0% To Do, 0% In Progress, 100% Done
-
dev-ready, doc-ready, po-ready, px-ready, qe-ready, ux-ready
-
-
---
-
---
Goal
Test NHC/SNR on Compact Cluster before our customers do.
We should start the testing as soon as the operator is available to us, even before released.
Identify pitfalls that arise in a compact cluster due to differences between control plane and worker nodes.
I.e.
- NHC/SNR can not fence ctl plane nodes today
- Any special considerations for networking between workers and ctl plane nodes?
- Anything to consider for affinity due to the different node pools
- Implications of the different node pools on update flow?
- …
User Stories
- As a RHV Cluster owner I want to run OCP with CNV on a similar BM footprint so that I do not need to get more or more expensive hardware.
- As a RHV cluster owner I want to have HA for VMs on my compact cluster so that I get comparable functionality than RHV
- As a RHV cluster owner I would like to minimize the downtime for any of my VMs in case a node failed
- As a RHV cluster owner I would like to understand the different timeouts I can set, what are "safe" values and what are the risks if selecting timeouts that are lower than the "safe" ones
- As a RHV cluster owner I would like to understand how to calculate the minimal values that are HW dependent
On a compact cluster with Node Remediation (poison pill) operator installed and shared storage:
- As a VM owner I would like my VM restart on another node within the same amount of time it takes or non-compact cluster in case the node it's running on fails.
Non-Requirements
- List of things not included in this epic, to alleviate any doubt raised during the grooming process.
Notes
- Any additional details or decisions made/needed
Done Checklist
Who | What | Reference |
---|---|---|
QE | Test plans in Polarion | https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-7092 |
QE | Automated tests merged | https://code.engineering.redhat.com/gerrit/c/cnv-tests/+/422251 |
- is blocked by
-
CNV-26864 Watchdog unable to reboot node completely in combination to SNR/NHC
-
- Backlog
-
- is depended on by
-
CNV-25645 Target additional testing for NHC and SNR with compact clusters
-
- Closed
-
- is related to
-
OCPBUGS-11277 Must gather is unable to collect all the data for Compact cluster if node is down
-
- Closed
-
- links to