Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-43195

VM migration fails with virError (guest CPU doesn't match specification: missing features: vmx-rdseed-exit)

XMLWordPrintable

    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Release Notes
    • Hide
      On heterogeneous clusters,
      We might face migration issues, with the errors "VM migration fails with virError (guest CPU doesn't match specificatio: missing features: vmx-* )"

      The current workaround is to set, cpu.model either at
      a) VM spec level, For VM level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-schedule-vms.html#virt-schedule-supported-cpu-model-vms_virt-schedule-vms

      b) or at the cluster level , For cluster-level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-configuring-default-cpu-model.html
      Show
      On heterogeneous clusters, We might face migration issues, with the errors "VM migration fails with virError (guest CPU doesn't match specificatio: missing features: vmx-* )" The current workaround is to set, cpu.model either at a) VM spec level, For VM level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-schedule-vms.html#virt-schedule-supported-cpu-model-vms_virt-schedule-vms b) or at the cluster level , For cluster-level, point them to this doc, https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-configuring-default-cpu-model.html
    • Known Issue
    • Proposed
    • ---
    • ---
    • Yes
    • High
    • No

      Description of problem:

      Follow the guidance on the web page to create and run the VM. Start the migration, but it fails very soon, and the VMI reports an error:
      
      reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
      
      Each node is configured identically. The VMI uses ocs-storagecluster-ceph-rbd-virtualization and is set to use the host-model CPU configuration.

      Version-Release number of selected component (if applicable):

      Openshift version: 4.16.0-rc.4
      CNV version: 4.16.0
      HCO image: brew.registry.redhat.io/rh-osbs/iib:737387
      OCS version: 4.16.0-126.stable

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create a VM (rhel9, u1.small) from InstanceType and wait till it's up.
      2. Click Actions: migrate on a different node.
      3. Check VM, VMI and virt-launcher pod.
      

      Actual results:

      VM starts to migrate but fails immediately. 
      
      [cloud-user@ocp-psi-executor-xl ~]$ oc describe vmi -n qwang
      Name:         rhel-9-red-leopard-84
      Namespace:    qwang
      Labels:       app.kubernetes.io/name=headless
                    kubevirt.io/migrationTargetNodeName=sys-qw-416-tbf4r-worker-0-knsb5
                    kubevirt.io/nodeName=sys-qw-416-tbf4r-worker-0-vv6sf
                    migration-test=qwang
      Annotations:  kubevirt.io/cluster-instancetype-name: u1.small
                    kubevirt.io/cluster-preference-name: rhel.9
                    kubevirt.io/latest-observed-api-version: v1
                    kubevirt.io/nonroot: true
                    kubevirt.io/storage-observed-api-version: v1
                    kubevirt.io/vm-generation: 2
      API Version:  kubevirt.io/v1
      Kind:         VirtualMachineInstance
      Metadata:
        Creation Timestamp:  2024-06-19T16:08:35Z
        Finalizers:
          kubevirt.io/virtualMachineControllerFinalize
          foregroundDeleteVirtualMachine
        Generation:  20
        Owner References:
          API Version:           kubevirt.io/v1
          Block Owner Deletion:  true
          Controller:            true
          Kind:                  VirtualMachine
          Name:                  rhel-9-red-leopard-84
          UID:                   a9870103-68d8-47ed-b60b-91c47fa2f75b
        Resource Version:        3759076
        UID:                     b0a4b55b-8a4e-4f78-bc8c-4390bad9c682
      Spec:
        Architecture:  amd64
        Domain:
          Cpu:
            Cores:    1
            Model:    host-model
            Sockets:  1
            Threads:  1
          Devices:
            Disks:
              Dedicated IO Thread:  true
              Disk:
                Bus:                virtio
              Name:                 rootdisk
              Dedicated IO Thread:  true
              Disk:
                Bus:  virtio
              Name:   cloudinitdisk
            Interfaces:
              Masquerade:
              Model:  virtio
              Name:   default
            Rng:
          Features:
            Acpi:
              Enabled:  true
            Smm:
              Enabled:  true
          Firmware:
            Bootloader:
              Efi:
                Secure Boot:  true
            Uuid:             d2ce6d9c-bca1-52af-a4bc-57b0aa2d964b
          Machine:
            Type:  pc-q35-rhel9.4.0
          Memory:
            Guest:  2Gi
          Resources:
            Requests:
              Memory:       2Gi
        Eviction Strategy:  LiveMigrate
        Networks:
          Name:  default
          Pod:
        Subdomain:  headless
        Volumes:
          Data Volume:
            Name:  rhel-9-red-leopard-84-volume
          Name:    rootdisk
          Cloud Init No Cloud:
            User Data:  #cloud-config
      chpasswd:
        expire: false
      password: k4vn-qo7y-ojk3
      user: cloud-user    Name:  cloudinitdisk
      Status:
        Active Pods:
          3bdc63f4-92f6-442a-ac3f-1f2bbefd7bcf:  sys-qw-416-tbf4r-worker-0-knsb5
          8b55126a-791e-4d75-81fd-b6491502c29b:  sys-qw-416-tbf4r-worker-0-vv6sf
        Conditions:
          Last Probe Time:       <nil>
          Last Transition Time:  2024-06-19T16:08:42Z
          Status:                True
          Type:                  Ready
          Last Probe Time:       <nil>
          Last Transition Time:  <nil>
          Message:               All of the VMI's DVs are bound and not running
          Reason:                AllDVsReady
          Status:                True
          Type:                  DataVolumesReady
          Last Probe Time:       <nil>
          Last Transition Time:  <nil>
          Status:                True
          Type:                  LiveMigratable
          Last Probe Time:       2024-06-19T16:09:03Z
          Last Transition Time:  <nil>
          Status:                True
          Type:                  AgentConnected
        Current CPU Topology:
          Cores:    1
          Sockets:  1
          Threads:  1
        Guest OS Info:
          Id:              rhel
          Kernel Release:  5.14.0-427.20.1.el9_4.x86_64
          Kernel Version:  #1 SMP PREEMPT_DYNAMIC Thu May 23 16:37:13 EDT 2024
          Machine:         x86_64
          Name:            Red Hat Enterprise Linux
          Pretty Name:     Red Hat Enterprise Linux 9.4 (Plow)
          Version:         9.4 (Plow)
          Version Id:      9.4
        Interfaces:
          Info Source:     domain, guest-agent
          Interface Name:  eth0
          Ip Address:      10.129.3.162
          Ip Addresses:
            10.129.3.162
          Mac:                             0a:58:0a:81:03:a2
          Name:                            default
          Queue Count:                     1
        Launcher Container Image Version:  registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:79090e4c013bb5922cf68899220ade1aaef24d14c9f00f5fc2d09582953f479e
        Machine:
          Type:  pc-q35-rhel9.4.0
        Memory:
          Guest At Boot:    2Gi
          Guest Current:    2Gi
          Guest Requested:  2Gi
        Migration Method:   BlockMigration
        Migration State:
          Completed:       true
          End Timestamp:   2024-06-19T16:11:04Z
          Failed:          true
          Failure Reason:  Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
          Migration Configuration:
            Allow Auto Converge:                    false
            Allow Post Copy:                        false
            Bandwidth Per Migration:                0
            Completion Timeout Per Gi B:            800
            Node Drain Taint Key:                   kubevirt.io/drain
            Parallel Migrations Per Cluster:        5
            Parallel Outbound Migrations Per Node:  2
            Progress Timeout:                       150
            Unsafe Migration Override:              false
          Migration Policy Name:                    policy-maroon-woodpecker-42
          Migration UID:                            9ef24a84-2dfa-4b4c-aa95-8e6d561062af
          Mode:                                     PreCopy
          Source Node:                              sys-qw-416-tbf4r-worker-0-vv6sf
          Source Pod:                               virt-launcher-rhel-9-red-leopard-84-6jjnh
          Start Timestamp:                          2024-06-19T16:11:02Z
          Target Direct Migration Node Ports:
            37407:                      49152
            38925:                      49153
            46493:                      0
          Target Node:                  sys-qw-416-tbf4r-worker-0-knsb5
          Target Node Address:          10.131.0.67
          Target Node Domain Detected:  true
          Target Pod:                   virt-launcher-rhel-9-red-leopard-84-994x5
        Migration Transport:            Unix
        Node Name:                      sys-qw-416-tbf4r-worker-0-vv6sf
        Phase:                          Running
        Phase Transition Timestamps:
          Phase:                        Pending
          Phase Transition Timestamp:   2024-06-19T16:08:35Z
          Phase:                        Scheduling
          Phase Transition Timestamp:   2024-06-19T16:08:35Z
          Phase:                        Scheduled
          Phase Transition Timestamp:   2024-06-19T16:08:42Z
          Phase:                        Running
          Phase Transition Timestamp:   2024-06-19T16:08:44Z
        Qos Class:                      Burstable
        Runtime User:                   107
        Selinux Context:                system_u:object_r:container_file_t:s0:c80,c311
        Virtual Machine Revision Name:  revision-start-vm-a9870103-68d8-47ed-b60b-91c47fa2f75b-2
        Volume Status:
          Name:    cloudinitdisk
          Size:    1048576
          Target:  vdb
          Name:    rootdisk
          Persistent Volume Claim Info:
            Access Modes:
              ReadWriteMany
            Capacity:
              Storage:            30Gi
            Filesystem Overhead:  0
            Requests:
              Storage:    32212254720
            Volume Mode:  Block
          Target:         vda
      Events:
        Type     Reason            Age                            From                         Message
        ----     ------            ----                           ----                         -------
        Normal   SuccessfulCreate  66s                            disruptionbudget-controller  Created PodDisruptionBudget kubevirt-disruption-budget-kj5w6
        Normal   SuccessfulCreate  66s                            virtualmachine-controller    Created virtual machine pod virt-launcher-rhel-9-red-leopard-84-6jjnh
        Normal   Created           57s                            virt-handler                 VirtualMachineInstance defined.
        Normal   Started           57s                            virt-handler                 VirtualMachineInstance started.
        Normal   SuccessfulUpdate  <invalid>                      virtualmachine-controller    Expanded PodDisruptionBudget kubevirt-disruption-budget-kj5w6
        Normal   PreparingTarget   <invalid> (x2 over <invalid>)  virt-handler                 VirtualMachineInstance Migration Target Prepared.
        Normal   PreparingTarget   <invalid>                      virt-handler                 Migration Target is listening at 10.131.0.67, on ports: 46493,37407,38925
        Normal   Migrating         <invalid>                      virt-handler                 VirtualMachineInstance is migrating.
        Warning  Migrated          <invalid>                      virt-handler                 VirtualMachineInstance migration uid 9ef24a84-2dfa-4b4c-aa95-8e6d561062af failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: vmx-rdseed-exit')
        Normal   SuccessfulUpdate  <invalid>                      disruptionbudget-controller  shrank PodDisruptionBudget kubevirt-disruption-budget-kj5w6

      Expected results:

      VM is migrated to a different node and keeps running.

      Additional info:

      Node:
      
      sh-5.1# lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         46 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  12
        On-line CPU(s) list:   0-11
      Vendor ID:               GenuineIntel
        BIOS Vendor ID:        Red Hat
        Model name:            Intel Xeon Processor (Icelake)
          BIOS Model name:     RHEL 7.6.0 PC (i440FX + PIIX, 1996)
          CPU family:          6
          Model:               134
          Thread(s) per core:  1
          Core(s) per socket:  1
          Socket(s):           12
          Stepping:            0
          BogoMIPS:            4589.21
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mm
                               x fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopolo
                               gy cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic
                                movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
                               3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriori
                               ty ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f av
                               x512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512v
                               l xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat vnmi avx512vbmi umip pku ospke avx5
                               12_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpi
                               d fsrm md_clear arch_capabilities
      Virtualization features: 
        Virtualization:        VT-x
        Hypervisor vendor:     KVM
        Virtualization type:   full
      Caches (sum of all):     
        L1d:                   384 KiB (12 instances)
        L1i:                   384 KiB (12 instances)
        L2:                    48 MiB (12 instances)
        L3:                    192 MiB (12 instances)
      NUMA:                    
        NUMA node(s):          1
        NUMA node0 CPU(s):     0-11
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
        Retbleed:              Not affected
        Spec rstack overflow:  Not affected
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS 
                               Not affected
        Srbds:                 Not affected
        Tsx async abort:       Mitigation; TSX disabled

            sgott@redhat.com Stuart Gott
            qwang@redhat.com Qixuan Wang
            Kedar Bidarkar Kedar Bidarkar
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: