-
Bug
-
Resolution: Done
-
Major
-
4.11.z
-
None
-
None
-
0
-
WINC - Sprint 228
-
1
-
Rejected
-
False
-
This is a clone of issue OCPBUGS-3506. The following is the description of the original issue:
—
Description of problem:
Workload's load balancer with external IP shows connectivity outage during Windows node upgrade when using windows/servercore image. During the reconciliation, a new node is being created therefore when the draining before the reconciliation of another node happens, the node does not contain the containers image anymore. If the time required to download the image is longer than the time it takes to reconcile the node we will end up in a situation in which no workload is available to handle the Load Balancer's requests, ending up in a service disruption.
Version-Release number of selected component (if applicable):
4.11
How reproducible:
Sometimes
Steps to Reproduce:
Create a script to continuously query a load balancer endpoint External IP or DNS name: ``` cat probeLB.sh #!/bin/bash set -e while true do date echo "curl 52.189.34.88" curl 52.189.34.88 echo "" sleep 2 done ``` 1. In a OCP cluster deploy WMCO 6.0 2. Create a Windows machineSet with 3 replicas 3. Wait for WMCO to configure the Windows nodes 4. Deploy win-server workloads with at least 3 replicas 5. Deploy load balancer NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/win-webserver LoadBalancer 172.30.105.53 52.189.34.88 80:30648/TCP 115m 6. Scale down WMCO deployment to 0 oc scale deployment.apps/windows-machine-config-operator --replicas=0 -n openshift-windows-machine-config-operator 7. Trigger Windows node upgrade by changing the version annotation in all Windows nodes. oc annotate node <windows-node-1> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion oc annotate node <windows-node-2> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion oc annotate node <windows-node-3> --overwrite windowsmachineconfig.openshift.io/version=invalidVersion 8. In a separate terminal, trigger the script to query a load balancer endpoint (probeLB.sh) 100 63 100 63 0 0 777 0 --:--:-- --:--:-- --:--:-- 777 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 9. Scale up WMCO deployment to 1 oc scale deployment.apps/windows-machine-config-operator --replicas=1 -n openshift-windows-machine-config-operator 10. Watch the script for the load balancer endpoint 100 63 100 63 0 0 741 0 --:--:-- --:--:-- --:--:-- 741 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:12 --:--:-- 0curl: (7) Failed to connect to 52.189.34.88 port 80: Connection refused
Actual results:
Load balancer connectivity lost with Windows nodes in Ready state. Load balancer starts responding after sometime.
Expected results:
Windows workload runs in available Windows nodes without no service disruption
Additional info:
Follow-up to https://bugzilla.redhat.com/show_bug.cgi?id=2103631
- blocks
-
OCPBUGS-4247 Load balancer shows connectivity outage during Windows nodes upgrade
- Closed
- clones
-
OCPBUGS-3506 Load balancer shows connectivity outage during Windows nodes upgrade
- Closed
- is blocked by
-
OCPBUGS-3506 Load balancer shows connectivity outage during Windows nodes upgrade
- Closed
- is cloned by
-
OCPBUGS-4247 Load balancer shows connectivity outage during Windows nodes upgrade
- Closed
- links to
- mentioned on