Details
-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.13, 4.12, 4.10
-
Important
-
3
-
WINC - Sprint 241, WINC - Sprint 242
-
2
-
Rejected
-
Unspecified
-
-
Known Issue
Description
Must gather logs:
1. Issue: Windows Machines can't scale up when publicIP machineset field is set to false on Azure. When the machineset is created, the machines are created successfully. However, when trying to scale up the very same machineset the newly provisioned machine hangs on Provisioned state.
[jfrancoa@localhost 107382]$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
jfrancoa-3005-azure-qf9hb-master-0 Running Standard_D8s_v3 westus 9h
jfrancoa-3005-azure-qf9hb-master-1 Running Standard_D8s_v3 westus 9h
jfrancoa-3005-azure-qf9hb-master-2 Running Standard_D8s_v3 westus 9h
jfrancoa-3005-azure-qf9hb-worker-westus-m6sxs Running Standard_D4s_v3 westus 9h
jfrancoa-3005-azure-qf9hb-worker-westus-tpnw4 Running Standard_D4s_v3 westus 9h
jfrancoa-3005-azure-qf9hb-worker-westus-xhvzc Running Standard_D4s_v3 westus 9h
win-d6k7v Running Standard_D2s_v3 westus 54m
win-hm2l2 Running Standard_D2s_v3 westus 14m
win-xxl8p Running Standard_D2s_v3 westus 72m
windows-r67jd Running Standard_D2s_v3 westus 8h
windows-tw5pg Provisioned Standard_D2s_v3 westus 8h
windows-txxcf Running Standard_D2s_v3 westus 8h
2. WMCO & OpenShift Version:
[jfrancoa@localhost 107382]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-05-26-102501 True False 8h Cluster version is 4.10.0-0.nightly-2022-05-26-102501
[jfrancoa@localhost 107382]$ oc get csv -n openshift-windows-machine-config-operator
NAME DISPLAY VERSION REPLACES PHASE
elasticsearch-operator.5.4.2 OpenShift Elasticsearch Operator 5.4.2 Succeeded
windows-machine-config-operator.v5.1.0 Windows Machine Config Operator 5.1.0 windows-machine-config-operator.v5.0.0 Succeeded
3. Platform - Azure
5. Is it a new test case or an old test case?
if it is the old test case, is it regression or first-time tested?
Is it platform-specific or consistent across all platforms?
It impacts an old test case, however I believe this was not observed before.
6. Steps to Reproduce
- Create a OCP 4.10 cluster, install WMCO 5.1.0 and create a machineset following the docs.
- Make sure the machines got propèrly created
- Scale up the number of machines to one more: oc scale --replicas=n+1 machineset <name> -n openshift-machine-api
- Wait for the machine to get into running state (which does not happen even after waiting for hours)
7. Actual Result and Expected Result
The Windows machines from the machineset can be scaled up properly.
8. A possible workaround has been tried? Is there a way to recover from the issue being tried out?
Setting the publiIP field to true does solve the issue. I realized that even though the docs suggest the creation of machinesets with publicIP set to false: https://docs.openshift.com/container-platform/4.10/windows_containers/creating_windows_machinesets/creating-windows-machineset-azure.html#windows-machineset-azure_creating-windows-machineset-azure all the machines created during the machineset creation did have a publicIP on Azure. However, the scaled up nodes which never got to Running state were missing the publicIP. Therefore, I created a second machineset with the publicIP field set to true and tried to scale it up, with no issues at all:
[jfrancoa@localhost 107382]$ oc get machineset -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
jfrancoa-3005-azure-qf9hb-worker-westus 3 3 3 3 9h
win 3 3 3 3 89m
windows 3 3 2 2 8h
win -> publicIP = true
windows -> publicIP = false
9. Logs
Must-gather-windows-node-logs(https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_windows_node_logs#L24)
oc get network.operator cluster -o yaml
oc logs -f deployment/windows-machine-config-operator -n openshift-windows-machine-config-operator
Windows MachineSet yaml or windows-instances ConfigMap
oc get machineset <windows_machineSet_name> -n openshift-machine-api -o yaml
oc get configmaps <windows_configmap_name> -n <namespace_name> -o yaml
Optional logs:
Anything that can be useful to debug the issue.