Loading...

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: CNV v4.12.0
Affects Version/s: None
Component/s: CNV Virtualization
Labels:
- cnv-4?
- cnvbugsm
- devel_ack+
- needinfo-
- needinfo?
- pm_ack+
- qa_ack?

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2099216
Bugzilla Bug:
RHBZ: 2099216

Sprint:
CNV Virtualization Sprint 222, CNV Virtualization Sprint 223, CNV Virtualization Sprint 224, CNV Virtualization Sprint 225
Severity:
Important

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem: I tried to create a VM with guest agent with the following spec:
http://pastebin.test.redhat.com/1059571
datavolume: http://pastebin.test.redhat.com/1059572
but I get this error message on the events of the VMI:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 55m virtualmachine-controller Created virtual machine pod virt-launcher-test-vm-l7fw2
Normal Created 55m virt-handler VirtualMachineInstance defined.
Normal Started 55m virt-handler VirtualMachineInstance started.
Warning SyncFailed 2m9s (x31 over 55m) virt-handler server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')"

I checked the logs (tail -n 200 /var/log/libvirt/qemu/*.log) in the virt-launcher pod and I noticed this error:

-msg timestamp=on
KVM: entry failed, hardware error 0x8
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00080661
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=04 66 41 eb f1 66 83 c9 ff 66 89 c8 66 5b 66 5e 66 5f 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

also I checked on the virt-launcher pod the status of the VM:

[mperetz@mperetz ~]$ oc rsh virt-launcher-simple-vm-kmnbc
sh-4.4# virsh list
Id Name State
----------------------------------
1 default_simple-vm paused

sh-4.4# exut\
> ^C
sh-4.4# exit
exit
command terminated with exit code 130
[mperetz@mperetz ~]$ oc get vmi
NAME AGE PHASE IP NODENAME READY
simple-vm 6m59s Running 10.128.2.40 oadp-12290-wqlcn-worker-0-llq8b True
[mperetz@mperetz ~]$

additional details:
lscpu of the worker nodes: http://pastebin.test.redhat.com/1059422
OCP version: 4.10 (OpenStack on PSI). Also tried 4.9.
Openstack flavor: ci.m1.xlarge
lscpu output:
sh-4.4# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Red Hat
CPU family: 6
Model: 134
Model name: Intel Xeon Processor (Icelake)
BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996)
Stepping: 0
CPU MHz: 2294.608
BogoMIPS: 4589.21
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 md_clear arch_capabilities

I'm not sure what exactly causes the issue based on the error message.
I also tried it on OCP 4.10 with the same CNV version, but with Openstack flavor ci.standard.xl and with a different server for the worker nodes:

sh-4.4# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Red Hat
CPU family: 6
Model: 85
Model name: Intel Xeon Processor (Skylake, IBRS)
BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996)
Stepping: 4
CPU MHz: 2095.076
BogoMIPS: 4190.15
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke md_clear arch_capabilities

and there it works.

Version-Release number of selected component (if applicable): CNV version: 4.9/4.10.2 (production)

How reproducible: 100% on the specific platform with the Icelake cpu-model

Steps to Reproduce:
not sure exactly what is the root cause as mentioned above, but that's how I reproduce:
1. Create with flexy-install job openstack cluster on PSI, with OCP version 4.10 and flavor ci.m1.xlarge (which usually deploys the worker nodes on a server with the Icelake CPU model).
2. deploy the following data volume and VM (happened also with other templates, like alpine, so not necessarily these exact templates are required):
http://pastebin.test.redhat.com/1059571
datavolume: http://pastebin.test.redhat.com/1059572
3. check the events of the VMI. Note you get this error evnetually:
"LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required')"
4. Look for the other logs/statuses as mentioned in the problem description.

Actual results:

Expected results:

Additional info:

external trackers

Red Hat Issue Tracker CNV-19214

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates