[OCPBUGS-29249] CPMS leaves only 2 masters during update - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.16.0
Affects Version/s: 4.13, 4.12, 4.14, 4.15, 4.16
Component/s: Cloud Compute / Unknown
Labels:
None

Regression:
No
Sprint:
CLOUD Sprint 249
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, when a control plane machine was marked as unready and a change was initiated by the modifying the control plane machine set, the unready machine was removed prematurely.
This premature action caused multiple indexes to be replaced simultaneously.
With this release, the control plane machine set no longer deletes a machine when only a single machine exists within the index.
This change prevents premature roll-out of changes and prevents more than one index from being replaced at a time.
(link:https://issues.redhat.com/browse/OCPBUGS-29249[*~~OCPBUGS-29249~~*])

Show
* Previously, when a control plane machine was marked as unready and a change was initiated by the modifying the control plane machine set, the unready machine was removed prematurely. This premature action caused multiple indexes to be replaced simultaneously. With this release, the control plane machine set no longer deletes a machine when only a single machine exists within the index. This change prevents premature roll-out of changes and prevents more than one index from being replaced at a time. (link: https://issues.redhat.com/browse/OCPBUGS-29249 [* OCPBUGS-29249 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.16.0
Target Backport Versions:

4.13.z, 4.12.z, 4.14.z, 4.15.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Observed during testing of candidate-4.15 image as of 2024-02-08.

This is an incomplete report as I haven't verified the reproducer yet or attempted to get a must-gather. I have observed this multiple times now, so I am confident it's a thing. I can't be confident that the procedure described here reliably reproduces it, or that all the described steps are required.

I have been using MCO to apply machine config to masters. This involves a rolling reboot of all masters.

During a rolling reboot I applied an update to CPMS. I observed the following sequence of events:

master-1 was NotReady as it was rebooting
I modified CPMS
CPMS immediately started provisioning a new master-0
CPMS immediately started deleting master-1
CPMS started provisioning a new master-1

At this point there were only 2 nodes in the cluster:

old master-0
old master-2

and machines provisioning:

new master-0
new master-1

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

98-var-lib-etcd.yaml
4 kB
2024/02/08 3:57 PM

blocks

OCPBUGS-29419 CPMS leaves only 2 masters during update

Closed

is cloned by

OCPBUGS-29419 CPMS leaves only 2 masters during update

Closed

links to

openshift/cluster-control-plane-machine-set-operator#278: OCPBUGS-29249: Never delete a Machine when there's a single Machine in an index

RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Joel Speed

Reporter:: Matthew Booth

QA Contact:: Huali Liu

Doc Contact:: Jeana Routh

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/02/08 3:31 PM

Updated:: 2024/06/27 11:37 AM

Resolved:: 2024/06/27 11:37 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide