-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
8
-
False
-
-
False
-
CLOSED
-
---
-
---
-
-
-
CNV Virtualization Sprint 228, CNV Virtualization Sprint 229, CNV Virtualization Sprint 230, CNV Virtualization Sprint 238
-
High
-
None
Some background:
-------------------------
I'm running a mini-scale OpenShift setup with 30 OpenShift nodes, the CNV build is the latest nightly 4.12.0-745, I'm currently running 1500 fedora VMs, and I have been doing some migration testing, unfortunately after triggering 1000 migration 161 VMs are stuck at an error state e.g:
virt-launcher-fedora-vm1421-7qjfj 0/2 Error 0 98m
virt-launcher-fedora-vm1422-plnpg 0/2 Error 0 64m
virt-launcher-fedora-vm1423-vw5nv 0/2 Error 0 81m
virt-launcher-fedora-vm1424-85bh7 0/2 Error 0 15m
virt-launcher-fedora-vm1425-vvflp 0/2 Error 0 48m
virt-launcher-fedora-vm1427-xmzls 0/2 Error 0 31m
virt-launcher-fedora-vm1428-8c7cf 0/2 Error 0 15m
virt-launcher-fedora-vm1430-wm56l 0/2 Error 0 15m
logs show:
-----------
{
"component": "virt-launcher",
"level": "info",
"msg": "Collected all requested hook sidecar sockets",
"pos": "manager.go:86",
"timestamp": "2022-12-01T12:41:01.973479Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Sorted all collected sidecar sockets per hook point based on their priority and name: map[]",
"pos": "manager.go:89",
"timestamp": "2022-12-01T12:41:01.973531Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Connecting to libvirt daemon: qemu+unix:///session?socket=/var/run/libvirt/libvirt-sock",
"pos": "libvirt.go:497",
"timestamp": "2022-12-01T12:41:01.974885Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')",
"pos": "libvirt.go:505",
"timestamp": "2022-12-01T12:41:01.975225Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "libvirt version: 8.0.0, package: 5.module+el8.6.0+14495+7194fa43 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2022-03-16-19:03:54, )",
"subcomponent": "libvirt",
"thread": "46",
"timestamp": "2022-12-01T12:41:02.286000Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "hostname: virt-launcher-fedora-vm1430-wm56l",
"subcomponent": "libvirt",
"thread": "46",
"timestamp": "2022-12-01T12:41:02.286000Z"
}
{
"component": "virt-launcher",
"level": "error",
"msg": "internal error: Unable to get session bus connection: Cannot autolaunch D-Bus without X11 $DISPLAY",
"pos": "virGDBusGetSessionBus:128",
"subcomponent": "libvirt",
"thread": "46",
"timestamp": "2022-12-01T12:41:02.286000Z"
}
{
"component": "virt-launcher",
"level": "error",
"msg": "internal error: Unable to get system bus connection: Could not connect: No such file or directory",
"pos": "virGDBusGetSystemBus:101",
"subcomponent": "libvirt",
"thread": "46",
"timestamp": "2022-12-01T12:41:02.286000Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Connected to libvirt daemon",
"pos": "libvirt.go:513",
"timestamp": "2022-12-01T12:41:02.476806Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Registered libvirt event notify callback",
"pos": "client.go:510",
"timestamp": "2022-12-01T12:41:02.479265Z"
}
{
"component": "virt-launcher",
"level": "info",
"msg": "Marked as ready",
"pos": "virt-launcher.go:74",
"timestamp": "2022-12-01T12:41:02.479412Z"
}
parse error: Invalid numeric literal at line 12, column 6
this is my migration config:
----------------------------
liveMigrationConfig:
completionTimeoutPerGiB: 800
parallelMigrationsPerCluster: 20
parallelOutboundMigrationsPerNode: 4
progressTimeout: 150
workloads: {}
--------------------
Versions of all relevant components:
CNV 4.12.0-745
OCP 4.11.4
CNV must-gather:
-----------------
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/pods_crah_after_migration.tar.gz
- duplicates
-
CNV-30386 [2218435] queueing multiple VMs migration causes virt-controller to hit a deadlock.
- POST
- external trackers