Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.11.z
Affects Version/s: 4.11
Component/s: Windows Containers
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.11.z
Release Blocker:
Rejected
Sprint:
WINC - Sprint 230
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-4092~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-3506~~. The following is the description of the original issue:
—
Description of problem:

Workload's load balancer with external IP shows connectivity outage during  Windows node upgrade when using windows/servercore image. During the reconciliation, a new node is being created therefore when the draining before the reconciliation of another node happens, the node does not contain the containers image anymore. If the time required to download the image is longer than the time it takes to reconcile the node we will end up in a situation in which no workload is available to handle the Load Balancer's requests, ending up in a service disruption.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

Sometimes

Steps to Reproduce:

Create a script to continuously query a load balancer endpoint External IP or DNS name:
```
cat probeLB.sh                                                             
#!/bin/bash
set -e
while true
do
    date
    echo "curl 52.189.34.88"
    curl 52.189.34.88
    echo ""
    sleep 2
done
```

1. In a OCP cluster deploy WMCO 6.0
2. Create a Windows machineSet with 3 replicas
3. Wait for WMCO to configure the Windows nodes
4. Deploy win-server workloads with at least 3 replicas
5. Deploy load balancer
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
service/win-webserver     LoadBalancer   172.30.105.53   52.189.34.88    80:30648/TCP     115m

6. Scale down WMCO deployment to 0
oc scale deployment.apps/windows-machine-config-operator --replicas=0 -n openshift-windows-machine-config-operator

7. Trigger Windows node upgrade by changing the version annotation in all Windows nodes.
oc annotate node <windows-node-1> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion
oc annotate node <windows-node-2> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion
oc annotate node <windows-node-3> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion

8. In a separate terminal, trigger the script to query a load balancer endpoint (probeLB.sh)
100    63  100    63    0     0    777      0 --:--:-- --:--:-- --:--:--   777
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

9. Scale up WMCO deployment to 1
oc scale deployment.apps/windows-machine-config-operator --replicas=1 -n openshift-windows-machine-config-operator


10. Watch the script for the load balancer endpoint
100    63  100    63    0     0    741      0 --:--:-- --:--:-- --:--:--   741
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0curl: (7) Failed to connect to 52.189.34.88 port 80: Connection refused

Actual results:

Load balancer connectivity lost with Windows nodes in Ready state. Load balancer starts responding after sometime.

Expected results:

Windows workload runs in available Windows nodes without no service disruption

Additional info:

Follow-up to https://bugzilla.redhat.com/show_bug.cgi?id=2103631

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

35707_AWS_411.log
84 kB
2023/01/10 9:46 AM

clones

OCPBUGS-4092 Load balancer shows connectivity outage during Windows nodes upgrade

Closed

is blocked by

OCPBUGS-4092 Load balancer shows connectivity outage during Windows nodes upgrade

Closed

links to

openshift/windows-machine-config-operator#1341: [release-4.11] OCPBUGS-4247: Upgrade Machine Nodes in place

mentioned on

Merge request - Updated US source to: 8b6be6b Merge pull request #1359 from alinaryan/upstream/release-4.11-submodule-update-12-12

Assignee:: Jose Valdes

Reporter:: OpenShift Prow Bot

QA Contact:: Aharon Rasouli

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/11/29 3:10 PM

Updated:: 2025/07/28 5:39 PM

Resolved:: 2023/08/07 12:27 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates