Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-56590

VM migration failure during control-plane-only migration under load from 4.16 to 4.17

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • CNV v4.17.4
    • CNV Virt-Cluster
    • None
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • ---
    • ---
    • Critical
    • None

      Description of problem:

      VM migration failure during control-plane-only migration from 4.16 to 4.17 prior to completing control-plane only upgrade to 4.18 or unpausing workers

      Version-Release number of selected component (if applicable):

      OCP 4.16.32 -> 4.17.15
      CNV 4.16.6 -> 4.17.4

      How reproducible:

      unknown

      Steps to Reproduce:

      1. Create VMs
      2. Start VM load using stress-ng
      3. Pause worker MCP
      4. Perform control-plane only upgrade

      Actual results:

      VM was unable to migrate successfully

      Expected results:

      All VMs migrated successfully to new virt-launcher pods

      Additional info:

      Running Cirros VMs with internal stress-ng load
      $ stress-ng --iomix 1 --cpu 1 --cpu-load 80 --cpu-load-slice 0 --vm 1 --timeout 0
      
      2000 VMs across 8 nodes
      
      $ oc get vmim -n default -l kubevirt.io/vmi-name=vm-instancetype-cirros-test-1128
      NAME                             PHASE       VMI
      kubevirt-workload-update-f4n99   Failed      vm-instancetype-cirros-test-1128
      kubevirt-workload-update-zqq6d   Succeeded   vm-instancetype-cirros-test-1128
      
      From kubevirt-workload-update-f4n99:
          failureReason: 'Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=0, Message=''An error occurred, but the cause is unknown'')' 
      
      
      From failed virt-launcher pod:
      {"component":"virt-launcher","kind":"","level":"info","msg":"Prepared migration target pod","name":"vm-instancetype-cirros-test-1128","namespace":"default","pos":"server.go:172","timestamp":"2025-02-13T00:40:26.401942Z","uid":"30f75425-0b1c-4ea0-9474-c89aaec297fd"}
      {"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 54 with status 0","pos":"virt-launcher-monitor.go:198","timestamp":"2025-02-13T00:40:27.076025Z"}
      {"component":"virt-launcher","level":"error","msg":"Unable to read from monitor: Connection reset by peer","pos":"qemuMonitorIORead:420","subcomponent":"libvirt","thread":"51","timestamp":"2025-02-13T00:41:01.546000Z"}
      {"component":"virt-launcher","level":"warning","msg":"Failed to probe capabilities for /usr/libexec/qemu-kvm: Unable to read from monitor: Connection reset by peer","pos":"virQEMUCapsLogProbeFailure:5596","subcomponent":"libvirt","thread":"29","timestamp":"2025-02-13T00:41:01.570000Z"}
      {"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 56 with status 134","pos":"virt-launcher-monitor.go:198","timestamp":"2025-02-13T00:41:01.601941Z"}
      {"component":"virt-launcher","kind":"","level":"info","msg":"Signaled target pod virt-launcher-vm-instancetype-cirros-test-1128-6crl8 to cleanup","name":"vm-instancetype-cirros-test-1128","namespace":"default","pos":"server.go:152","timestamp":"2025-02-13T00:41:02.555921Z","uid":"30f75425-0b1c-4ea0-9474-c89aaec297fd"}
      panic: received early exit signal
      {"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 10 with status 512","pos":"virt-launcher-monitor.go:198","timestamp":"2025-02-13T00:41:02.661605Z"}
      {"component":"virt-launcher-monitor","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"virt-launcher-monitor.go:216","timestamp":"2025-02-13T00:41:02.672887Z"}

       

              kbidarka@redhat.com Kedar Bidarkar
              rhn-support-sbennert Sarah Bennert
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: