Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 4.13.0
Affects Version/s: 4.13
Component/s: Windows Containers
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.13.0
Release Blocker:
Rejected
Sprint:
WINC - Sprint 230, WINC - Sprint 231, WINC - Sprint 232
sprint_count:
3

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, Microsoft Windows container workloads were not completely emptied during the Windows node upgrade process. This resulted in service disruptions because the workloads remained on the node being upgraded. With this update, the Windows Machine Config Operator (WMCO) draines workloads and then cordones nodes until node upgrades finish. This action ensures a seamless upgrade for Microsoft Windows instances. link:https://issues.redhat.com/browse/OCPBUGS-5732[*~~OCPBUGS-5732~~*])

Show
* Previously, Microsoft Windows container workloads were not completely emptied during the Windows node upgrade process. This resulted in service disruptions because the workloads remained on the node being upgraded. With this update, the Windows Machine Config Operator (WMCO) draines workloads and then cordones nodes until node upgrades finish. This action ensures a seamless upgrade for Microsoft Windows instances. link: https://issues.redhat.com/browse/OCPBUGS-5732 [* OCPBUGS-5732 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


During the validation of [OCPBUGS-4247|https://issues.redhat.com/browse/OCPBUGS-4247] I observed that all the workloads remained in the same Windows worker node during the whole upgrade process, however if an upgrade is taking place the kubelet will be impacted by such an upgrade so the workloads need to be move away from that node before the reconfiguration of the node occurss.
This was confirmed by Mohammad:
_I think I found the cause btw. For machine nodes, we don't try to find an instance associated with the machine being reconciled, instead just initializing the instanceInfo with a nil node. So when we check if an upgrade is required (i.e. should we deconfigure), we get false_

This behavior was included as part of the bug:  [OCPBUGS-3506|https://issues.redhat.com/browse/OCPBUGS-3506] , which got cherry-picked into 4.12 and 4.11 too, therefore this bug impacts all those versions.

Adding wmco logs as well as the traces which confirm that none of the workloads are moving out from the nodes to be reconfigured.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy a IPI cluster with Windows workers. Create some workloads for those Windows workers
2. Perform and upgrade or simply modify the version annotation of each of the worker nodes
3. Wait for WMCO to reconfigure (or upgrade) all the windows workers. Keep track on where those workers are landing, yo can use the following snippet for it:

lb=`oc get svc -l app=win-webserver -n winc-test -o=jsonpath="{.items[0].status.loadBalancer.ingress[0].hostname}"`;file=/tmp/35707_AWS_412.log;for i in {1..60}; do time=`date`; echo -e "\n#######ATTEMTP #${i} ${time}  ######" &>> $file;oc get nodes -l=node.openshift.io/os_id="Windows" &>> $file;oc get pods -n winc-test -o wide &>> $file;curl --connect-timeout 60 $lb &>> $file;sleep 60; done

Actual results:

None of the Windows nodes get drained during the upgrade. Workloads remain in the same node which got reconfigured.

Expected results:

The Windows nodes get drained during the upgrade, right before WMCO reconfigures them.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

35707_AWS_411.log
2023/01/11 6:40 AM
84 kB
Jose Luis Franco Arza
wmco_upgrade.logs
2023/01/11 6:40 AM
114 kB
Jose Luis Franco Arza
wmco.log
2023/01/27 10:58 AM
54 kB
Jose Luis Franco Arza

blocks

OCPBUGS-5803 Windows nodes do not get drained (deconfigure) during the upgrade process

Closed

is blocked by

OCPBUGS-7022 WMCO does not respect sequence when performing upgrades

Closed

is cloned by

OCPBUGS-5803 Windows nodes do not get drained (deconfigure) during the upgrade process

Closed

links to

openshift/windows-machine-config-operator#1378: OCPBUGS-5732: [wm_controller] Fix Machine node upgrades

mentioned on

Merge request - Updated US source to: 877e2bb Merge pull request #1388 from openshift-cherrypick-robot/cherry-pick-1382-to-release-4.11

Merge request - Updated US source to: f1714d7 Merge pull request #1378 from saifshaikh48/machine-node-deconfig

(1 mentioned on)

Assignee:: Mohammad Shaikh (Inactive)

Reporter:: Jose Luis Franco Arza (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Jose Luis Franco Arza (Inactive)

Doc Contact:: Darragh Fitzmaurice

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/01/11 6:40 AM

Updated:: 2025/07/28 11:36 AM

Resolved:: 2023/05/10 12:28 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates