Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: CNV v4.11.0
Affects Version/s: None
Component/s: CNV Virtualization
Labels:
- TestCannotAutomate
- TestOnly
- Watchlist+
- cnv-4?
- cnvbugsm
- devel_ack+
- needinfo+
- needinfo?
- pm_ack+
- qa_ack?
- qe_test_coverage?

Blocked:
False
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2010485
Bugzilla Bug:
RHBZ: 2010485

Sprint:
CNV Virtualization Sprint 209, CNV Virtualization Sprint 210, CNV Doc Sprint 212
Severity:
High

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

CNV cluster with 24+ nodes, 850 virtual machines

Windows 10 VM's seem to fall offline. When using the UI console - screen shows blank.

For some of the Windows logs we see:
Event log shows "Reset to device, \Device\RaidPort2, was issued. "

also pods are showing:
error killing pod: [failed to "KillContainer" for "compute" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "<sandbox_id>" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to stop container for pod sandbox <sandbox_id>: failed to stop container k8s_compute_virt-launcher-<pod>.virtualmachines_<container_id>: context deadline exceeded"]

This seemed to happen after a mass windows update:

The guest was Windows 10 all updates.
Then these patches were applied to the Windows VM’s:
KB5005700
KB5005566
After this, 150 out of 700 went rogue and had the symptoms described above.

sample windows VM yaml
—
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: <$VM>
name: <$VM>
namespace: virtualmachines
spec:
dataVolumeTemplates:

metadata:
name: <$VM>
spec:
pvc:
accessModes:
ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
source:
blank: {}
status: {}
running: false
template:
metadata:
creationTimestamp: null
labels:
kubevirt.io/vm: <$VM>
spec:
domain:
clock:
timer:
hpet:
present: false
hyperv: {}
pit:
tickPolicy: delay
rtc:
tickPolicy: catchup
utc: {}
cpu:
cores: 1
model: host-model
sockets: 2
devices:
disks:
bootOrder: 2
disk:
bus: virtio
pciAddress: "0000:00:02.0"
name: os-disk
interfaces:
bootOrder: 1
bridge: {}
macAddress: <$MAC>
name: vnic0
pciAddress: "0000:00:03.0"
networkInterfaceMultiqueue: true
features:
acpi: {}
apic: {}
hyperv:
evmcs: {}
frequencies: {}
ipi: {}
reenlightenment: {}
relaxed: {}
reset: {}
runtime: {}
spinlocks:
spinlocks: 8191
synic: {}
synictimer: {}
tlbflush: {}
vapic: {}
vpindex: {}
firmware:
uuid: <$UUID>
resources:
requests:
cpu: 1500m
memory: 11Gi
networks:
multus:
networkName: <$VLAN_ID>
name: vnic0
terminationGracePeriodSeconds: 30
evictionStrategy: LiveMigrate
volumes:
dataVolume:
name: <$VOL_NAME>
name: os-disk
status: {}

external trackers

Red Hat Customer Portal 03040086

Red Hat Issue Tracker CNV-14352

Assignee:: Igor Bezukh

Reporter:: Jonathan Edwards

QA Contact:: Kedar Bidarkar

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/10/04 7:10 PM

Updated:: 2023/01/20 10:34 AM

Resolved:: 2022/07/08 7:03 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates