-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
During a PoC we trying to proof a cluster-store and it failed because of a reboot storm caused by SNR.
Cluster design/size:
- 3 schedule able control plane nodes
- 4 worker nodes
Versions: OpenShift v4.17
Details follow with must-gather
Our procedure:
- Power off / hard shutdown 2 of 3 control plane nodes
=> API is not available any more, as expected - Start the restore based on our documentation: 5.3.2. Restoring to a previous cluster state
- Etcd backup
- Start cluster-restore.sh
SNR kicked in and rebooted the master node and all other nodes.
=> Test failed!
- links to
-
RHEA-2024:136700 Self Node Remediation 0.9.1