Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60135

BYOH node upgrade is failing in Server 2019 error instantiating Windows instance from VM: unable to setup VM <IP>

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19
    • Windows Containers
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0
    • None
    • Yes
    • None
    • None
    • WINC - Sprint 279
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-58282. The following is the description of the original issue:

      Description of problem:

          upgrading BYOH nodes in Server 2019

      Version-Release number of selected component (if applicable):

          10.19.0-c6693cf

      How reproducible:

          100%

      Steps to Reproduce:

          1. Create a cluster with 2 machineset nodes and 2 BYOH node(not machineset) in 4.18
          2. Upgrade the cluster to 4.19
          3. Upgrade the WMCO to latest version
      
          

      Actual results:

          Machineset nodes upgraded - BYOH upgrade is failing
      oc get node -l kubernetes.io/os=windows -owide
      NAME                                       STATUS                        ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION    CONTAINER-RUNTIME
      ip-10-0-11-53.us-east-2.compute.internal   Ready                         worker   164m   v1.32.5   10.0.11.53    <none>        Windows Server 2019 Datacenter   10.0.17763.7434   containerd://1.7.26
      ip-10-0-23-72.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker   6h9m   v1.31.8   10.0.23.72    <none>        Windows Server 2019 Datacenter   10.0.17763.7434   containerd://1.7.26
      ip-10-0-31-78.us-east-2.compute.internal   Ready                         worker   6h3m   v1.31.8   10.0.31.78    <none>        Windows Server 2019 Datacenter   10.0.17763.7434   containerd://1.7.26
      ip-10-0-31-79.us-east-2.compute.internal   Ready                         worker   178m   v1.32.5   10.0.31.79    <none>        Windows Server 2019 Datacenter   10.0.17763.7434   containerd://1.7.26
      
      Failed node:
      oc describe node ip-10-0-20-167.us-east-2.compute.internal
      Name:               ip-10-0-20-167.us-east-2.compute.internal
      Roles:              worker
      Labels:             beta.kubernetes.io/arch=amd64
                          beta.kubernetes.io/instance-type=m5a.large
                          beta.kubernetes.io/os=windows
                          failure-domain.beta.kubernetes.io/region=us-east-2
                          failure-domain.beta.kubernetes.io/zone=us-east-2a
                          kubernetes.io/arch=amd64
                          kubernetes.io/hostname=ip-10-0-20-167.us-east-2.compute.internal
                          kubernetes.io/os=windows
                          node-role.kubernetes.io/worker=
                          node.kubernetes.io/instance-type=m5a.large
                          node.kubernetes.io/windows-build=10.0.17763
                          node.openshift.io/os_id=Windows
                          topology.k8s.aws/zone-id=use2-az1
                          topology.kubernetes.io/region=us-east-2
                          topology.kubernetes.io/zone=us-east-2a
      Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.20.167
                          cloud.network.openshift.io/egress-ipconfig:
                            [{"interface":"eni-0ae3a3d353d0238cc","ifaddr":{"ipv4":"10.0.0.0/19"},"capacity":{"ipv4":9,"ipv6":10}}]
                          k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-2C-0E-9A
                          k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.4.0/24
                          machine.openshift.io/machine: openshift-machine-api/rrasouli-4912-kd2fn-windows-worker-us-east-2a-bq8fp
                          volumes.kubernetes.io/controller-managed-attach-detach: true
                          windowsmachineconfig.openshift.io/desired-version: 10.19.0-c6693cf
                          windowsmachineconfig.openshift.io/pub-key-hash: 6b5dbed399c508c1d0edcf1432bce445d9cb3f3c8832f14bd8b5e37c329ccfc5
                          windowsmachineconfig.openshift.io/reboot-required:
                          windowsmachineconfig.openshift.io/username:
                            wx4EBwMIHDj1dmDrIOlgJbVoTTEoUBM5J9Bly3JXog3S5gESl16KYW8rLUm0Tv03<wmcoMarker>ZYUE8WAtXZKcX9QctpuweDuRfAD5/fIEDGfVnC4zGhu+3hPNjry/7RSnx5Coxb...
                          windowsmachineconfig.openshift.io/version: 10.19.0-c6693cf
      CreationTimestamp:  Mon, 30 Jun 2025 18:03:02 +0300
      Taints:             os=Windows:NoSchedule
      Unschedulable:      false
      Lease:
        HolderIdentity:  ip-10-0-20-167.us-east-2.compute.internal
        AcquireTime:     <unset>
        RenewTime:       Mon, 30 Jun 2025 18:13:24 +0300
      Conditions:
        Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
        ----             ------  -----------------                 ------------------                ------                       -------
        MemoryPressure   False   Mon, 30 Jun 2025 18:08:58 +0300   Mon, 30 Jun 2025 18:03:02 +0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
        DiskPressure     False   Mon, 30 Jun 2025 18:08:58 +0300   Mon, 30 Jun 2025 18:03:02 +0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
        PIDPressure      False   Mon, 30 Jun 2025 18:08:58 +0300   Mon, 30 Jun 2025 18:03:02 +0300   KubeletHasSufficientPID      kubelet has sufficient PID available
        Ready            True    Mon, 30 Jun 2025 18:08:58 +0300   Mon, 30 Jun 2025 18:03:56 +0300   KubeletReady                 kubelet is posting ready status
      Addresses:
        InternalIP:   10.0.20.167
        InternalDNS:  ip-10-0-20-167.us-east-2.compute.internal
        Hostname:     ip-10-0-20-167.us-east-2.compute.internal
      Capacity:
        cpu:                2
        ephemeral-storage:  125827068Ki
        memory:             8125976Ki
        pods:               250
      Allocatable:
        cpu:                1500m
        ephemeral-storage:  114888483853
        memory:             6028824Ki
        pods:               250
      System Info:
        Machine ID:                 EC2AMAZ-HFBNRK4
        System UUID:                EC28C544-9E07-9D9B-1530-0654DC2E5EF1
        Boot ID:                    856
        Kernel Version:             10.0.17763.7434
        OS Image:                   Windows Server 2019 Datacenter
        Operating System:           windows
        Architecture:               amd64
        Container Runtime Version:  containerd://1.7.26
        Kubelet Version:            v1.32.5
        Kube-Proxy Version:         v1.32.5
      ProviderID:                   aws:///us-east-2a/i-06de677d9aa2e47a1
      Non-terminated Pods:          (0 in total)
        Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
        ---------                   ----    ------------  ----------  ---------------  -------------  ---
      Allocated resources:
        (Total limits may be over 100 percent, i.e., overcommitted.)
        Resource           Requests  Limits
        --------           --------  ------
        cpu                0 (0%)    0 (0%)
        memory             0 (0%)    0 (0%)
        ephemeral-storage  0 (0%)    0 (0%)
      Events:
        Type    Reason                   Age                From                   Message
        ----    ------                   ----               ----                   -------
        Normal  Starting                 9m15s              kube-proxy
        Normal  Synced                   10m                cloud-node-controller  Node synced successfully
        Normal  Starting                 10m                kubelet                Starting kubelet.
        Normal  NodeHasSufficientMemory  10m (x2 over 10m)  kubelet                Node ip-10-0-20-167.us-east-2.compute.internal status is now: NodeHasSufficientPID
        Normal  RegisteredNode           10m                node-controller        Node ip-10-0-20-167.us-east-2.compute.internal event: Registered Node ip-10-0-20-167.us-east-2.compute.internal in Controller
        Normal  NodeNotSchedulable       10m                kubelet                Node ip-10-0-20-167.us-east-2.compute.internal status is now: NodeNotSchedulable
        Normal  NodeReady                9m31s              kubelet                Node ip-10-0-20-167.us-east-2.compute.internal status is now: NodeReady
        Normal  NodeSchedulable          9m4s               kubelet                Node ip-10-0-20-167.us-east-2.compute.internal status is now: NodeSchedulable
      
      
      

       

      Expected results:

         BYOH Upgrade nodes successful 

      Additional info:

      wmco log: 
      https://redhat-internal.slack.com/files/U01J4GKLCLQ/F093EFTGTHD/wmco.log?origin_channel=Vall_threads 

              rh-ee-ssoto Sebastian Soto
              rrasouli Aharon Rasouli
              None
              None
              Weinan Liu Weinan Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: