Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9292

[WINC] scaled up Windows Machines cannot be SSHed into when publicIP is set to false on Azure

    XMLWordPrintable

Details

    • Important
    • 3
    • WINC - Sprint 241, WINC - Sprint 242
    • 2
    • Rejected
    • Unspecified
    • Hide
      Cause: Not known

      Consequence: Windows machineset can't scale up

      Workaround (if any): set publicIp: true in machineset

      Result:
      Show
      Cause: Not known Consequence: Windows machineset can't scale up Workaround (if any): set publicIp: true in machineset Result:
    • Known Issue

    Description

      Must gather logs:

      1. Issue: Windows Machines can't scale up when publicIP machineset field is set to false on Azure. When the machineset is created, the machines are created successfully. However, when trying to scale up the very same machineset the newly provisioned machine hangs on Provisioned state.
      [jfrancoa@localhost 107382]$ oc get machine -n openshift-machine-api
      NAME PHASE TYPE REGION ZONE AGE
      jfrancoa-3005-azure-qf9hb-master-0 Running Standard_D8s_v3 westus 9h
      jfrancoa-3005-azure-qf9hb-master-1 Running Standard_D8s_v3 westus 9h
      jfrancoa-3005-azure-qf9hb-master-2 Running Standard_D8s_v3 westus 9h
      jfrancoa-3005-azure-qf9hb-worker-westus-m6sxs Running Standard_D4s_v3 westus 9h
      jfrancoa-3005-azure-qf9hb-worker-westus-tpnw4 Running Standard_D4s_v3 westus 9h
      jfrancoa-3005-azure-qf9hb-worker-westus-xhvzc Running Standard_D4s_v3 westus 9h
      win-d6k7v Running Standard_D2s_v3 westus 54m
      win-hm2l2 Running Standard_D2s_v3 westus 14m
      win-xxl8p Running Standard_D2s_v3 westus 72m
      windows-r67jd Running Standard_D2s_v3 westus 8h
      windows-tw5pg Provisioned Standard_D2s_v3 westus 8h
      windows-txxcf Running Standard_D2s_v3 westus 8h

      2. WMCO & OpenShift Version:
      [jfrancoa@localhost 107382]$ oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.10.0-0.nightly-2022-05-26-102501 True False 8h Cluster version is 4.10.0-0.nightly-2022-05-26-102501

      [jfrancoa@localhost 107382]$ oc get csv -n openshift-windows-machine-config-operator
      NAME DISPLAY VERSION REPLACES PHASE
      elasticsearch-operator.5.4.2 OpenShift Elasticsearch Operator 5.4.2 Succeeded
      windows-machine-config-operator.v5.1.0 Windows Machine Config Operator 5.1.0 windows-machine-config-operator.v5.0.0 Succeeded

      3. Platform - Azure

      5. Is it a new test case or an old test case?
      if it is the old test case, is it regression or first-time tested?
      Is it platform-specific or consistent across all platforms?
      It impacts an old test case, however I believe this was not observed before.

      6. Steps to Reproduce

      • Create a OCP 4.10 cluster, install WMCO 5.1.0 and create a machineset following the docs.
      • Make sure the machines got propèrly created
      • Scale up the number of machines to one more: oc scale --replicas=n+1 machineset <name> -n openshift-machine-api
      • Wait for the machine to get into running state (which does not happen even after waiting for hours)
        7. Actual Result and Expected Result

      The Windows machines from the machineset can be scaled up properly.

      8. A possible workaround has been tried? Is there a way to recover from the issue being tried out?

      Setting the publiIP field to true does solve the issue. I realized that even though the docs suggest the creation of machinesets with publicIP set to false: https://docs.openshift.com/container-platform/4.10/windows_containers/creating_windows_machinesets/creating-windows-machineset-azure.html#windows-machineset-azure_creating-windows-machineset-azure all the machines created during the machineset creation did have a publicIP on Azure. However, the scaled up nodes which never got to Running state were missing the publicIP. Therefore, I created a second machineset with the publicIP field set to true and tried to scale it up, with no issues at all:
      [jfrancoa@localhost 107382]$ oc get machineset -n openshift-machine-api
      NAME DESIRED CURRENT READY AVAILABLE AGE
      jfrancoa-3005-azure-qf9hb-worker-westus 3 3 3 3 9h
      win 3 3 3 3 89m
      windows 3 3 2 2 8h

      win -> publicIP = true
      windows -> publicIP = false

      9. Logs
      Must-gather-windows-node-logs(https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_windows_node_logs#L24)
      oc get network.operator cluster -o yaml
      oc logs -f deployment/windows-machine-config-operator -n openshift-windows-machine-config-operator
      Windows MachineSet yaml or windows-instances ConfigMap
      oc get machineset <windows_machineSet_name> -n openshift-machine-api -o yaml
      oc get configmaps <windows_configmap_name> -n <namespace_name> -o yaml

      Optional logs:
      Anything that can be useful to debug the issue.

      Attachments

        Activity

          People

            team-winc Team WinC
            rhn-engineering-jfrancoa Jose Luis Franco Arza (Inactive)
            Jose Luis Franco Arza Jose Luis Franco Arza (Inactive)
            Red Hat Employee
            Alina Ryan
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: