-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
8
-
False
-
-
False
-
ASSIGNED
-
virt-launcher-rhel9-container-v4.16.2-7
-
---
-
---
-
-
CNV Virtualization Sprint 238, CNV Virtualization Sprint 239, CNV Virtualization Sprint 240
-
Important
-
No
--------------------------------------------------
Description of problem:
--------------------------------------------------
as a part of an OOM investigation, I deliberately attempted hitting OOM on a VM by straining the memory while using a heavy IO workload,but something unexpected occurred when instead of just OOM which causes QEMU reboot the VM failed to run again:
NAME AGE STATUS READY
rhel82-vm0001 26h CrashLoopBackOff False
rhel82-vm0002 26h Running True
rhel82-vm0003 26h Stopped False
pod logs :
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"rhel82-vm0001","namespace":"default","pos":"server.go:184","reason":"virError(Code=1, Domain=10, Message='internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-212-default_rhel82-vm000/org.qemu.guest_agent.0' too long')","timestamp":"2023-02-08T18:36:49.591084Z","uid":"0a6404c5-2ba7-4cc0-ad2a-307018174023"} {"component":"virt-launcher","level":"info","msg":"Still missing PID for default_rhel82-vm0001, open /run/libvirt/qemu/run/default_rhel82-vm0001.pid: no such file or directory","pos":"monitor.go:125","timestamp":"2023-02-08T18:36:49.664614Z"} {"component":"virt-launcher","level":"error","msg":"internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-213-default_rhel82-vm000/org.qemu.guest_agent.0' too long","pos":"qemuOpenChrChardevUNIXSocket:5223","subcomponent":"libvirt","thread":"30","timestamp":"2023-02-08T18:36:50.627000Z"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"rhel82-vm0001","namespace":"default","pos":"manager.go:880","reason":"virError(Code=1, Domain=10, Message='internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-213-default_rhel82-vm000/org.qemu.guest_agent.0' too long')","timestamp":"2023-02-08T18:36:50.628244Z","uid":"0a6404c5-2ba7-4cc0-ad2a-307018174023"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"rhel82-vm0001","namespace":"default","pos":"server.go:184","reason":"virError(Code=1, Domain=10, Message='internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-213-default_rhel82-vm000/org.qemu.guest_agent.0' too long')","timestamp":"2023-02-08T18:36:50.628304Z","uid":"0a6404c5-2ba7-4cc0-ad2a-307018174023"} {"component":"virt-launcher","level":"info","msg":"Still missing PID for default_rhel82-vm0001, open /run/libvirt/qemu/run/default_rhel82-vm0001.pid: no such file or directory","pos":"monitor.go:125","timestamp":"2023-02-08T18:36:50.663725Z"} {"component":"virt-launcher","level":"info","msg":"Still missing PID for default_rhel82-vm0001, open /run/libvirt/qemu/run/default_rhel82-vm0001.pid: no such file or directory","pos":"monitor.go:125","timestamp":"2023-02-08T18:36:51.663684Z"} {"component":"virt-launcher","level":"error","msg":"internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-214-default_rhel82-vm000/org.qemu.guest_agent.0' too long","pos":"qemuOpenChrChardevUNIXSocket:5223","subcomponent":"libvirt","thread":"29","timestamp":"2023-02-08T18:36:51.663000Z"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"rhel82-vm0001","namespace":"default","pos":"manager.go:880","reason":"virError(Code=1, Domain=10, Message='internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-214-default_rhel82-vm000/org.qemu.guest_agent.0' too long')","timestamp":"2023-02-08T18:36:51.664818Z","uid":"0a6404c5-2ba7-4cc0-ad2a-307018174023"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"rhel82-vm0001","namespace":"default","pos":"server.go:184","reason":"virError(Code=1, Domain=10, Message='internal error: UNIX socket path '/var/run/kubevirt-private/libvirt/qemu/channel/target/domain-214-default_rhel82-vm00 0/org.qemu.guest_agent.0' too long')","timestamp":"2023-02-08T18:36:51.664884Z","uid":"0a6404c5-2ba7-4cc0-ad2a-307018174023"}OOM record:
[Wed Feb 8 12:19:15 2023] worker invoked oom-killer: gfp_mask=0x620100(GFP_NOIO|_GFP_HARDWALL|_GFP_WRITE), order=0, oom_score_adj=979
[Wed Feb 8 12:19:15 2023] oom_kill_process.cold.32+0xb/0x10
[Wed Feb 8 12:19:15 2023] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Wed Feb 8 12:19:15 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-a2021e5dd93338ba5e39cef21c773838a294ab95a466c7887054e9e24f72e8e4.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6553054d_e923_4628_b36c_c6754eb6e0b1.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6553054d_e923_4628_b36c_c6754eb6e0b1.slice/crio-a2021e5dd93338ba5e39cef21c773838a294ab95a466c7887054e9e24f72e8e4.scope,task=qemu-kvm,pid=3196344,uid=107
[Wed Feb 8 12:19:15 2023] Memory cgroup out of memory: Killed process 3196344 (qemu-kvm) total-vm:64560756kB, anon-rss:58285188kB, file-rss:17672kB, shmem-rss:4kB, UID:107 pgtables:115428kB oom_score_adj:979
[Wed Feb 8 12:19:15 2023] oom_reaper: reaped process 3196344 (qemu-kvm), now anon-rss:0kB, file-rss:68kB, shmem-rss:4kB
--------------------------------------------------
Version-Release number of selected component (if applicable):
--------------------------------------------------
kubevirt-hyperconverged-operator.v4.11.3
local-storage-operator.v4.12.0-202301042354
mcg-operator.v4.11.4
ocs-operator.v4.11.4
--------------------------------------------------
How reproducible:
--------------------------------------------------
no idea but the current state is persistent throughout.
--------------------------------------------------
Steps to Reproduce:
--------------------------------------------------
1. strain the VM using a heavy-duty workload
2. reach OOM
3. repeat
--------------------------------------------------
Actual results:
--------------------------------------------------
VM no longer boot.
--------------------------------------------------
Expected results:
--------------------------------------------------
VM reboots and starts normally
--------------------------------------------------
logs:
--------------------------------------------------
I collect both must gather and the SOS report from the specific node that ran the VM
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/vm_doesnt_boot_after_oom.tar.gz
- blocks
-
RHEL-7542 The path to the guest agent socket file can become too long and cause problems
- Closed
- impacts account
-
RHEL-7542 The path to the guest agent socket file can become too long and cause problems
- Closed
- external trackers
- links to
-
RHEA-2024:138478 OpenShift Virtualization 4.16.3 Images