Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-43195

VM migration fails with virError (guest CPU doesn't match specification: missing features: vmx-rdseed-exit)

XMLWordPrintable

    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • virt-launcher-rhel9-container-v4.16.1-8
    • Release Notes
    • Hide
      VM migrations might fail on clusters with mixed CPU types.
      - As a workaround, you can set the CPU model at the VM spec level or at the cluster level.

      On heterogeneous clusters,
      We might face migration issues, with the errors "VM migration fails with virError (guest CPU doesn't match specificatio: missing features: vmx-* )"

      The current workaround is to set, cpu.model either at
      a) VM spec level, For VM level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-schedule-vms.html#virt-schedule-supported-cpu-model-vms_virt-schedule-vms

      b) or at the cluster level , For cluster-level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-configuring-default-cpu-model.html
      Show
      VM migrations might fail on clusters with mixed CPU types. - As a workaround, you can set the CPU model at the VM spec level or at the cluster level. On heterogeneous clusters, We might face migration issues, with the errors "VM migration fails with virError (guest CPU doesn't match specificatio: missing features: vmx-* )" The current workaround is to set, cpu.model either at a) VM spec level, For VM level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-schedule-vms.html#virt-schedule-supported-cpu-model-vms_virt-schedule-vms b) or at the cluster level , For cluster-level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-configuring-default-cpu-model.html
    • Known Issue
    • Done
    • ---
    • ---
    • Yes
    • High
    • No

      Description of problem:

      Follow the guidance on the web page to create and run the VM. Start the migration, but it fails very soon, and the VMI reports an error:
      
      reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
      
      Each node is configured identically. The VMI uses ocs-storagecluster-ceph-rbd-virtualization and is set to use the host-model CPU configuration.

      Version-Release number of selected component (if applicable):

      Openshift version: 4.16.0-rc.4
      CNV version: 4.16.0
      HCO image: brew.registry.redhat.io/rh-osbs/iib:737387
      OCS version: 4.16.0-126.stable

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create a VM (rhel9, u1.small) from InstanceType and wait till it's up.
      2. Click Actions: migrate on a different node.
      3. Check VM, VMI and virt-launcher pod.
      

      Actual results:

      VM starts to migrate but fails immediately. 
      
      [cloud-user@ocp-psi-executor-xl ~]$ oc describe vmi -n qwang
      Name:         rhel-9-red-leopard-84
      Namespace:    qwang
      Labels:       app.kubernetes.io/name=headless
                    kubevirt.io/migrationTargetNodeName=sys-qw-416-tbf4r-worker-0-knsb5
                    kubevirt.io/nodeName=sys-qw-416-tbf4r-worker-0-vv6sf
                    migration-test=qwang
      Annotations:  kubevirt.io/cluster-instancetype-name: u1.small
                    kubevirt.io/cluster-preference-name: rhel.9
                    kubevirt.io/latest-observed-api-version: v1
                    kubevirt.io/nonroot: true
                    kubevirt.io/storage-observed-api-version: v1
                    kubevirt.io/vm-generation: 2
      API Version:  kubevirt.io/v1
      Kind:         VirtualMachineInstance
      Metadata:
        Creation Timestamp:  2024-06-19T16:08:35Z
        Finalizers:
          kubevirt.io/virtualMachineControllerFinalize
          foregroundDeleteVirtualMachine
        Generation:  20
        Owner References:
          API Version:           kubevirt.io/v1
          Block Owner Deletion:  true
          Controller:            true
          Kind:                  VirtualMachine
          Name:                  rhel-9-red-leopard-84
          UID:                   a9870103-68d8-47ed-b60b-91c47fa2f75b
        Resource Version:        3759076
        UID:                     b0a4b55b-8a4e-4f78-bc8c-4390bad9c682
      Spec:
        Architecture:  amd64
        Domain:
          Cpu:
            Cores:    1
            Model:    host-model
            Sockets:  1
            Threads:  1
          Devices:
            Disks:
              Dedicated IO Thread:  true
              Disk:
                Bus:                virtio
              Name:                 rootdisk
              Dedicated IO Thread:  true
              Disk:
                Bus:  virtio
              Name:   cloudinitdisk
            Interfaces:
              Masquerade:
              Model:  virtio
              Name:   default
            Rng:
          Features:
            Acpi:
              Enabled:  true
            Smm:
              Enabled:  true
          Firmware:
            Bootloader:
              Efi:
                Secure Boot:  true
            Uuid:             d2ce6d9c-bca1-52af-a4bc-57b0aa2d964b
          Machine:
            Type:  pc-q35-rhel9.4.0
          Memory:
            Guest:  2Gi
          Resources:
            Requests:
              Memory:       2Gi
        Eviction Strategy:  LiveMigrate
        Networks:
          Name:  default
          Pod:
        Subdomain:  headless
        Volumes:
          Data Volume:
            Name:  rhel-9-red-leopard-84-volume
          Name:    rootdisk
          Cloud Init No Cloud:
            User Data:  #cloud-config
      chpasswd:
        expire: false
      password: k4vn-qo7y-ojk3
      user: cloud-user    Name:  cloudinitdisk
      Status:
        Active Pods:
          3bdc63f4-92f6-442a-ac3f-1f2bbefd7bcf:  sys-qw-416-tbf4r-worker-0-knsb5
          8b55126a-791e-4d75-81fd-b6491502c29b:  sys-qw-416-tbf4r-worker-0-vv6sf
        Conditions:
          Last Probe Time:       <nil>
          Last Transition Time:  2024-06-19T16:08:42Z
          Status:                True
          Type:                  Ready
          Last Probe Time:       <nil>
          Last Transition Time:  <nil>
          Message:               All of the VMI's DVs are bound and not running
          Reason:                AllDVsReady
          Status:                True
          Type:                  DataVolumesReady
          Last Probe Time:       <nil>
          Last Transition Time:  <nil>
          Status:                True
          Type:                  LiveMigratable
          Last Probe Time:       2024-06-19T16:09:03Z
          Last Transition Time:  <nil>
          Status:                True
          Type:                  AgentConnected
        Current CPU Topology:
          Cores:    1
          Sockets:  1
          Threads:  1
        Guest OS Info:
          Id:              rhel
          Kernel Release:  5.14.0-427.20.1.el9_4.x86_64
          Kernel Version:  #1 SMP PREEMPT_DYNAMIC Thu May 23 16:37:13 EDT 2024
          Machine:         x86_64
          Name:            Red Hat Enterprise Linux
          Pretty Name:     Red Hat Enterprise Linux 9.4 (Plow)
          Version:         9.4 (Plow)
          Version Id:      9.4
        Interfaces:
          Info Source:     domain, guest-agent
          Interface Name:  eth0
          Ip Address:      10.129.3.162
          Ip Addresses:
            10.129.3.162
          Mac:                             0a:58:0a:81:03:a2
          Name:                            default
          Queue Count:                     1
        Launcher Container Image Version:  registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:79090e4c013bb5922cf68899220ade1aaef24d14c9f00f5fc2d09582953f479e
        Machine:
          Type:  pc-q35-rhel9.4.0
        Memory:
          Guest At Boot:    2Gi
          Guest Current:    2Gi
          Guest Requested:  2Gi
        Migration Method:   BlockMigration
        Migration State:
          Completed:       true
          End Timestamp:   2024-06-19T16:11:04Z
          Failed:          true
          Failure Reason:  Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
          Migration Configuration:
            Allow Auto Converge:                    false
            Allow Post Copy:                        false
            Bandwidth Per Migration:                0
            Completion Timeout Per Gi B:            800
            Node Drain Taint Key:                   kubevirt.io/drain
            Parallel Migrations Per Cluster:        5
            Parallel Outbound Migrations Per Node:  2
            Progress Timeout:                       150
            Unsafe Migration Override:              false
          Migration Policy Name:                    policy-maroon-woodpecker-42
          Migration UID:                            9ef24a84-2dfa-4b4c-aa95-8e6d561062af
          Mode:                                     PreCopy
          Source Node:                              sys-qw-416-tbf4r-worker-0-vv6sf
          Source Pod:                               virt-launcher-rhel-9-red-leopard-84-6jjnh
          Start Timestamp:                          2024-06-19T16:11:02Z
          Target Direct Migration Node Ports:
            37407:                      49152
            38925:                      49153
            46493:                      0
          Target Node:                  sys-qw-416-tbf4r-worker-0-knsb5
          Target Node Address:          10.131.0.67
          Target Node Domain Detected:  true
          Target Pod:                   virt-launcher-rhel-9-red-leopard-84-994x5
        Migration Transport:            Unix
        Node Name:                      sys-qw-416-tbf4r-worker-0-vv6sf
        Phase:                          Running
        Phase Transition Timestamps:
          Phase:                        Pending
          Phase Transition Timestamp:   2024-06-19T16:08:35Z
          Phase:                        Scheduling
          Phase Transition Timestamp:   2024-06-19T16:08:35Z
          Phase:                        Scheduled
          Phase Transition Timestamp:   2024-06-19T16:08:42Z
          Phase:                        Running
          Phase Transition Timestamp:   2024-06-19T16:08:44Z
        Qos Class:                      Burstable
        Runtime User:                   107
        Selinux Context:                system_u:object_r:container_file_t:s0:c80,c311
        Virtual Machine Revision Name:  revision-start-vm-a9870103-68d8-47ed-b60b-91c47fa2f75b-2
        Volume Status:
          Name:    cloudinitdisk
          Size:    1048576
          Target:  vdb
          Name:    rootdisk
          Persistent Volume Claim Info:
            Access Modes:
              ReadWriteMany
            Capacity:
              Storage:            30Gi
            Filesystem Overhead:  0
            Requests:
              Storage:    32212254720
            Volume Mode:  Block
          Target:         vda
      Events:
        Type     Reason            Age                            From                         Message
        ----     ------            ----                           ----                         -------
        Normal   SuccessfulCreate  66s                            disruptionbudget-controller  Created PodDisruptionBudget kubevirt-disruption-budget-kj5w6
        Normal   SuccessfulCreate  66s                            virtualmachine-controller    Created virtual machine pod virt-launcher-rhel-9-red-leopard-84-6jjnh
        Normal   Created           57s                            virt-handler                 VirtualMachineInstance defined.
        Normal   Started           57s                            virt-handler                 VirtualMachineInstance started.
        Normal   SuccessfulUpdate  <invalid>                      virtualmachine-controller    Expanded PodDisruptionBudget kubevirt-disruption-budget-kj5w6
        Normal   PreparingTarget   <invalid> (x2 over <invalid>)  virt-handler                 VirtualMachineInstance Migration Target Prepared.
        Normal   PreparingTarget   <invalid>                      virt-handler                 Migration Target is listening at 10.131.0.67, on ports: 46493,37407,38925
        Normal   Migrating         <invalid>                      virt-handler                 VirtualMachineInstance is migrating.
        Warning  Migrated          <invalid>                      virt-handler                 VirtualMachineInstance migration uid 9ef24a84-2dfa-4b4c-aa95-8e6d561062af failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
        Normal   SuccessfulUpdate  <invalid>                      disruptionbudget-controller  shrank PodDisruptionBudget kubevirt-disruption-budget-kj5w6

      Expected results:

      VM is migrated to a different node and keeps running.

      Additional info:

      Node:
      
      sh-5.1# lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         46 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  12
        On-line CPU(s) list:   0-11
      Vendor ID:               GenuineIntel
        BIOS Vendor ID:        Red Hat
        Model name:            Intel Xeon Processor (Icelake)
          BIOS Model name:     RHEL 7.6.0 PC (i440FX + PIIX, 1996)
          CPU family:          6
          Model:               134
          Thread(s) per core:  1
          Core(s) per socket:  1
          Socket(s):           12
          Stepping:            0
          BogoMIPS:            4589.21
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mm
                               x fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopolo
                               gy cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic
                                movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
                               3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriori
                               ty ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f av
                               x512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512v
                               l xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat vnmi avx512vbmi umip pku ospke avx5
                               12_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpi
                               d fsrm md_clear arch_capabilities
      Virtualization features: 
        Virtualization:        VT-x
        Hypervisor vendor:     KVM
        Virtualization type:   full
      Caches (sum of all):     
        L1d:                   384 KiB (12 instances)
        L1i:                   384 KiB (12 instances)
        L2:                    48 MiB (12 instances)
        L3:                    192 MiB (12 instances)
      NUMA:                    
        NUMA node(s):          1
        NUMA node0 CPU(s):     0-11
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
        Retbleed:              Not affected
        Spec rstack overflow:  Not affected
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS 
                               Not affected
        Srbds:                 Not affected
        Tsx async abort:       Mitigation; TSX disabled

              bmordeha@redhat.com Barak Mordehai
              qwang@redhat.com Qixuan Wang
              Kedar Bidarkar Kedar Bidarkar
              Jiří Herrmann Jiří Herrmann
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: