Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.13.0
Affects Version/s: 4.11.z
Component/s: Windows Containers
Labels:
None

Regression:
None
Story Points:
3
Sprint:
WINC - Sprint 228
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Workload's load balancer with external IP shows connectivity outage during  Windows node upgrade when using windows/servercore image. During the reconciliation, a new node is being created therefore when the draining before the reconciliation of another node happens, the node does not contain the containers image anymore. If the time required to download the image is longer than the time it takes to reconcile the node we will end up in a situation in which no workload is available to handle the Load Balancer's requests, ending up in a service disruption.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

Sometimes

Steps to Reproduce:

Create a script to continuously query a load balancer endpoint External IP or DNS name:
```
cat probeLB.sh                                                             
#!/bin/bash
set -e
while true
do
    date
    echo "curl 52.189.34.88"
    curl 52.189.34.88
    echo ""
    sleep 2
done
```

1. In a OCP cluster deploy WMCO 6.0
2. Create a Windows machineSet with 3 replicas
3. Wait for WMCO to configure the Windows nodes
4. Deploy win-server workloads with at least 3 replicas
5. Deploy load balancer
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
service/win-webserver     LoadBalancer   172.30.105.53   52.189.34.88    80:30648/TCP     115m

6. Scale down WMCO deployment to 0
oc scale deployment.apps/windows-machine-config-operator --replicas=0 -n openshift-windows-machine-config-operator

7. Trigger Windows node upgrade by changing the version annotation in all Windows nodes.
oc annotate node <windows-node-1> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion
oc annotate node <windows-node-2> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion
oc annotate node <windows-node-3> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion

8. In a separate terminal, trigger the script to query a load balancer endpoint (probeLB.sh)
100    63  100    63    0     0    777      0 --:--:-- --:--:-- --:--:--   777
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

9. Scale up WMCO deployment to 1
oc scale deployment.apps/windows-machine-config-operator --replicas=1 -n openshift-windows-machine-config-operator


10. Watch the script for the load balancer endpoint
100    63  100    63    0     0    741      0 --:--:-- --:--:-- --:--:--   741
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0curl: (7) Failed to connect to 52.189.34.88 port 80: Connection refused

Actual results:

Load balancer connectivity lost with Windows nodes in Ready state. Load balancer starts responding after sometime.

Expected results:

Windows workload runs in available Windows nodes without no service disruption

Additional info:

Follow-up to https://bugzilla.redhat.com/show_bug.cgi?id=2103631

blocks

OCPBUGS-4092 Load balancer shows connectivity outage during Windows nodes upgrade

Closed

is cloned by

OCPBUGS-4092 Load balancer shows connectivity outage during Windows nodes upgrade

Closed

is duplicated by

WINC-929 Machine Nodes follow same upgrade path as BYOH nodes

Closed

links to

openshift/windows-machine-config-operator#1321: OCPBUGS-3506: Upgrade Machine Nodes in place

Assignee:: Sebastian Soto

Reporter:: Jose Valdes

QA Contact:: Aharon Rasouli

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/11/10 5:12 PM

Updated:: 2024/03/25 6:01 PM

Resolved:: 2023/05/10 12:29 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates