Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-72341

Some failures when migrating more than 1000 VMs in a single NS

XMLWordPrintable

    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      I tried to migrate a large number of VMs (more than 1000), all deployed in a single NS. I got no issue bellow 1000 VMs but above, I start to see some random errors on virt-handler pods. I said random because there is no pattern or clear correlation with other resources with the cluster.
      
      For example, out of 1300 VMs migrated, I got 6 that were not able to migrate, the logs of one of them and its yaml are bellow
      # oc logs virt-launcher-virt-migration-0-268-mjhd4
      {"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:88","timestamp":"2025-11-12T08:46:45.957482Z"}
      {"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:91","timestamp":"2025-11-12T08:46:45.957528Z"}
      {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu+unix:///session?socket=/var/run/libvirt/virtqemud-sock","pos":"libvirt.go:633","timestamp":"2025-11-12T08:46:45.957753Z"}
      {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory')","pos":"libvirt.go:641","timestamp":"2025-11-12T08:46:45.957997Z"}
      {"component":"virt-launcher","level":"info","msg":"libvirt version: 10.10.0, package: 7.7.el9_6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2025-08-05-18:51:24, )","subcomponent":"libvirt","thread":"40","timestamp":"2025-11-12T08:46:45.972000Z"}
      {"component":"virt-launcher","level":"info","msg":"hostname: virt-migration-0-268","subcomponent":"libvirt","thread":"40","timestamp":"2025-11-12T08:46:45.972000Z"}
      {"component":"virt-launcher","level":"error","msg":"internal error: Unable to get session bus connection: Cannot spawn a message bus without a machine-id: Invalid machine ID in /var/lib/dbus/machine-id or /etc/machine-id","pos":"virGDBusGetSessionBus:126","subcomponent":"libvirt","thread":"40","timestamp":"2025-11-12T08:46:45.972000Z"}
      {"component":"virt-launcher","level":"error","msg":"internal error: Unable to get system bus connection: Could not connect: No such file or directory","pos":"virGDBusGetSystemBus:99","subcomponent":"libvirt","thread":"40","timestamp":"2025-11-12T08:46:45.973000Z"}
      {"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:649","timestamp":"2025-11-12T08:46:46.459491Z"}
      {"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:602","timestamp":"2025-11-12T08:46:46.461623Z"}
      {"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:77","timestamp":"2025-11-12T08:46:46.461746Z"}
      panic: timed out waiting for domain to be defined
      {"component":"virt-launcher-monitor","level":"info","msg":"Reaped Launcher main pid","pos":"virt-launcher-monitor.go:128","timestamp":"2025-11-12T08:51:40.467432Z"}
      {"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 8 with status 512","pos":"virt-launcher-monitor.go:131","timestamp":"2025-11-12T08:51:40.467537Z"}
      {"component":"virt-launcher-monitor","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"virt-launcher-monitor.go:145","timestamp":"2025-11-12T08:51:40.467549Z"}
      {"component":"virt-launcher-monitor","level":"error","msg":"failed to read qemu log directory","pos":"virt-launcher-monitor.go:151","reason":"open /run/kubevirt-private/libvirt/qemu/log: no such file or directory","timestamp":"2025-11-12T08:51:40.467590Z"}
      {"component":"virt-launcher-monitor","level":"info","msg":"virt-launcher-monitor: Exiting...","pos":"virt-launcher-monitor.go:91","timestamp":"2025-11-12T08:51:48.276570Z"} 

      Version-Release number of selected component (if applicable):

      4.20

      How reproducible:

      easy

      Steps to Reproduce:

      1. Create a large number of VM in one NS (1000+)
      2. Try to Migrate
      

      Actual results:

      Some VMI are not migrated

      Expected results:

      All VMI are migrated

      Additional info:

      This bug has been seen with a UDN attachment but it also appears without it

              nrozen@redhat.com Nir Rozen
              ccaporal@redhat.com Charles Caporali
              Nir Rozen Nir Rozen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: