Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-14352

[2010485] Windows VMs offline after update

XMLWordPrintable

    • CNV Virtualization Sprint 209, CNV Virtualization Sprint 210, CNV Doc Sprint 212
    • High
    • None

      CNV cluster with 24+ nodes, 850 virtual machines

      Windows 10 VM's seem to fall offline. When using the UI console - screen shows blank.

      For some of the Windows logs we see:
      Event log shows "Reset to device, \Device\RaidPort2, was issued. "

      also pods are showing:
      error killing pod: [failed to "KillContainer" for "compute" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "<sandbox_id>" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to stop container for pod sandbox <sandbox_id>: failed to stop container k8s_compute_virt-launcher-<pod>.virtualmachines_<container_id>: context deadline exceeded"]

      This seemed to happen after a mass windows update:

      The guest was Windows 10 all updates.
      Then these patches were applied to the Windows VM’s:
      KB5005700
      KB5005566
      After this, 150 out of 700 went rogue and had the symptoms described above.

      sample windows VM yaml

      apiVersion: kubevirt.io/v1alpha3
      kind: VirtualMachine
      metadata:
      labels:
      kubevirt.io/vm: <$VM>
      name: <$VM>
      namespace: virtualmachines
      spec:
      dataVolumeTemplates:

      • metadata:
        name: <$VM>
        spec:
        pvc:
        accessModes:
      • ReadWriteMany
        resources:
        requests:
        storage: 100Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
        source:
        blank: {}
        status: {}
        running: false
        template:
        metadata:
        creationTimestamp: null
        labels:
        kubevirt.io/vm: <$VM>
        spec:
        domain:
        clock:
        timer:
        hpet:
        present: false
        hyperv: {}
        pit:
        tickPolicy: delay
        rtc:
        tickPolicy: catchup
        utc: {}
        cpu:
        cores: 1
        model: host-model
        sockets: 2
        devices:
        disks:
      • bootOrder: 2
        disk:
        bus: virtio
        pciAddress: "0000:00:02.0"
        name: os-disk
        interfaces:
      • bootOrder: 1
        bridge: {}
        macAddress: <$MAC>
        name: vnic0
        pciAddress: "0000:00:03.0"
        networkInterfaceMultiqueue: true
        features:
        acpi: {}
        apic: {}
        hyperv:
        evmcs: {}
        frequencies: {}
        ipi: {}
        reenlightenment: {}
        relaxed: {}
        reset: {}
        runtime: {}
        spinlocks:
        spinlocks: 8191
        synic: {}
        synictimer: {}
        tlbflush: {}
        vapic: {}
        vpindex: {}
        firmware:
        uuid: <$UUID>
        resources:
        requests:
        cpu: 1500m
        memory: 11Gi
        networks:
      • multus:
        networkName: <$VLAN_ID>
        name: vnic0
        terminationGracePeriodSeconds: 30
        evictionStrategy: LiveMigrate
        volumes:
      • dataVolume:
        name: <$VOL_NAME>
        name: os-disk
        status: {}

              ibezukh Igor Bezukh
              rhn-support-joedward Jonathan Edwards
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: