-
Bug
-
Resolution: Done-Errata
-
Major
-
4.12.z
-
Important
-
No
-
0
-
WINC - Sprint 244, WINC - Sprint 245, WINC - Sprint 246, WINC - Sprint 248, WINC - Sprint 249, WINC - Sprint 250
-
6
-
Rejected
-
False
-
-
-
Bug Fix
-
In Progress
This is a clone of issue OCPBUGS-23016. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-22984. The following is the description of the original issue:
—
Description of problem:
When WMCO is upgraded, a reconciliation workflow is triggered to ensure the existing Windows Nodes are up to date with the new version. As the upgrade process makes Nodes unschedulable for a period, only one node at a time should be upgraded, in order to maintain as high availability as possible for Windows workloads.
There are two related issues occurring here:
When there are multiple Machine Nodes to upgrade. The WMCO Machine controller will upgrade them sequentially, however if there is an error during the upgrade process, upgrading that Machine will stop, and the Machine will be moved to the end of the queue. This can continue until all Machines are partially upgraded, and unusable.
The second issue is that if a cluster has BYOH nodes and Machine nodes, it is possible for a BYOH node and a Machine node to go through the upgrade process at the same time, as the two controllers run concurrently.
Both of this issues are caused by WMCO not keeping track of when a Node is currently mid-upgrade.
Version-Release number of selected component (if applicable):
OCP 4.12
How reproducible:
Always
Steps to Reproduce:
1. Install a previous version of WMCO
2. Create a Windows MachineSet
3. Add a BYOH Windows Node to the cluster
4. Allow WMCO to configure both Windows machines as nodes
5. Upgrade WMCO to the latest version
Actual results:
The nodes will be upgraded at the same time, with multiple Nodes having their desired version annotation changed at the same time.
Expected results:
WMCO upgrades one node at a time.
QE notes:
- To test this scenario the min recommended number of Windows nodes is 3, where only one Windows node should perform the upgrade at a time.
- Expect a longer overall elapsed time in the upgrade since the process is now serial.
- clones
-
OCPBUGS-23016 WMCO upgrade strategy fails to upgrade one node at a time
- Closed
- is blocked by
-
OCPBUGS-23016 WMCO upgrade strategy fails to upgrade one node at a time
- Closed
-
WINC-1191 Add CI job to test bundle upgrades in Azure for release-4.12 branch
- Closed
- links to
-
RHBA-2023:125497 Red Hat OpenShift for Windows Containers 7.2.1 product release
- mentioned on