Observed during testing of candidate-4.15 image as of 2024-02-08.
This is an incomplete report as I haven't verified the reproducer yet or attempted to get a must-gather. I have observed this multiple times now, so I am confident it's a thing. I can't be confident that the procedure described here reliably reproduces it, or that all the described steps are required.
I have been using MCO to apply machine config to masters. This involves a rolling reboot of all masters.
During a rolling reboot I applied an update to CPMS. I observed the following sequence of events:
- master-1 was NotReady as it was rebooting
- I modified CPMS
- CPMS immediately started provisioning a new master-0
- CPMS immediately started deleting master-1
- CPMS started provisioning a new master-1
At this point there were only 2 nodes in the cluster:
- old master-0
- old master-2
and machines provisioning:
- new master-0
- new master-1
- blocks
-
OCPBUGS-29419 CPMS leaves only 2 masters during update
- Closed
- is cloned by
-
OCPBUGS-29419 CPMS leaves only 2 masters during update
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update