Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-21761

"qemu-kvm: Failed to put registers after init" when do vm migration

    • rhel-sst-virtualization
    • ssg_virtualization
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      What were you trying to do that didn't work?

      Start a vm, migrate it to another host or do save&&restore on local host. It report error:

       

      [root@dell-per740-78 ~]# virsh save avocado-vt-vm1 ./avocado-vt-vm1.save1
      Domain 'avocado-vt-vm1' saved to ./avocado-vt-vm1.save1
      [root@dell-per740-78 ~]# virsh restore ./avocado-vt-vm1.save1 
      error: Failed to restore domain from ./avocado-vt-vm1.save1
      error: operation failed: domain is not running
      

       

      Please provide the package NVR for which bug is seen:

      Host&&guest kernel: kernel-5.14.0-408.el9.x86_64

      qemu-kvm-8.2.0-2.el9.x86_64

      libvirt-9.10.0-1.el9.x86_64

      microcode_ctl-20230808-2.20231009.1.el9_3.noarch

      How reproducible:

      80%

      Steps to reproduce

      1. Start a guest
      2. Wait for guest os to boot up(not sure whether this is relevant, but in my practice, I can't reproduce this issue if guest os is not fully boot up)
      3. Migrate it to another host, or do save&&restore on local host
       [root@dell-per740-78 ~]# virsh save avocado-vt-vm1 ./avocado-vt-vm1.save1
      
      Domain 'avocado-vt-vm1' saved to ./avocado-vt-vm1.save1
      
      [root@dell-per740-78 ~]# virsh restore ./avocado-vt-vm1.save1 
      error: Failed to restore domain from ./avocado-vt-vm1.save1
      error: operation failed: domain is not running
      
      1.  Check qemu log
      2024-01-16T11:13:37.586851Z qemu-kvm: Failed to put registers after init: Invalid argument
      2024-01-16 11:13:37.790+0000: shutting down, reason=crashed
      
      

      Expected results

      Migration can succeed, or save&&restore can succeed

      Actual results

      Migration fails, or save&&restore fails.

      Additional info

      Not every machine can reproduce this issue, here is the host cpu info of the machine I used:
      (not sure whether it is relevant)

      # lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         46 bits physical, 48 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  40
        On-line CPU(s) list:   0-39
      Vendor ID:               GenuineIntel
        BIOS Vendor ID:        Intel
        Model name:            Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
          BIOS Model name:     Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
          CPU family:          6
          Model:               85
          Thread(s) per core:  2
          Core(s) per socket:  10
          Socket(s):           2
          Stepping:            7
          CPU max MHz:         3200.0000
          CPU min MHz:         1000.0000
          BogoMIPS:            4400.00
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe s
                               yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf p
                               ni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popc
                               nt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single in
                               tel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 sme
                               p bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xs
                               aveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window
                                hwp_epp hwp_pkg_req vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
      Virtualization features: 
        Virtualization:        VT-x
      Caches (sum of all):     
        L1d:                   640 KiB (20 instances)
        L1i:                   640 KiB (20 instances)
        L2:                    20 MiB (20 instances)
        L3:                    27.5 MiB (2 instances)
      NUMA:                    
        NUMA node(s):          2
        NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
        NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
      Vulnerabilities:         
        Gather data sampling:  Mitigation; Microcode
        Itlb multihit:         KVM: Mitigation: VMX disabled
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
        Retbleed:              Mitigation; Enhanced IBRS
        Spec rstack overflow:  Not affected
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
        Srbds:                 Not affected
        Tsx async abort:       Mitigation; TSX disabled
      
      

        1. image-2024-01-26-11-17-59-098.png
          image-2024-01-26-11-17-59-098.png
          90 kB
        2. kvm-cpu.log
          3 kB
        3. virtqemud.log
          6.60 MB
        4. vm.log
          45 kB
        5. vm.xml
          7 kB

              zhexu@redhat.com Peter Xu
              rhn-support-fjin Fangge Jin
              Paolo Bonzini
              virt-maint virt-maint
              Xiaohui Li Xiaohui Li
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: