Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-64417

[vcpu hotplug] the cpu is offline after migration successfully

XMLWordPrintable

    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      [vcpu hotplug] the cpu is offline after migration successfully

      Version-Release number of selected component (if applicable):

      CNV: registry-proxy.engineering.redhat.com/rh-osbs/iib:977162
      OCP 4.19.0-ec.4

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create a VM by following yaml config.
      
      apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
        annotations:
          kubemacpool.io/transaction-timestamp: "2025-05-30T15:05:57.204944275Z"
          kubevirt.io/latest-observed-api-version: v1
          kubevirt.io/storage-observed-api-version: v1
          vm.kubevirt.io/validations: |
            [
              {
                "name": "minimal-required-memory",
                "path": "jsonpath::.spec.domain.memory.guest",
                "rule": "integer",
                "message": "This VM requires more memory.",
                "min": 1610612736
              }
            ]
        creationTimestamp: "2025-05-30T09:44:26Z"
        finalizers:
        - kubevirt.io/virtualMachineControllerFinalize
        generation: 9
        labels:
          app: rhel97
          kubevirt.io/dynamic-credentials-support: "true"
          vm.kubevirt.io/template: rhel9-server-small
          vm.kubevirt.io/template.namespace: openshift
          vm.kubevirt.io/template.revision: "1"
          vm.kubevirt.io/template.version: v0.34.0
        name: rhel97
        namespace: default
        resourceVersion: "53900460"
        uid: 82b148ed-198b-4436-acfd-214bc9067822
      spec:
        dataVolumeTemplates:
        - apiVersion: cdi.kubevirt.io/v1beta1
          kind: DataVolume
          metadata:
            creationTimestamp: null
            name: rhel97
          spec:
            source:
              http:
                url: http://<internal_server>/libvirt-CI-resources/RHEL-9.7-x86_64-latest-ovmf.qcow2
            storage:
              accessModes:
              - ReadWriteMany
              resources:
                requests:
                  storage: 16Gi
              storageClassName: ocs-storagecluster-cephfs
              volumeMode: Filesystem
        runStrategy: RerunOnFailure
        template:
          metadata:
            annotations:
              vm.kubevirt.io/flavor: small
              vm.kubevirt.io/os: rhel9
              vm.kubevirt.io/workload: server
            creationTimestamp: null
            labels:
              kubevirt.io/domain: rhel97
              kubevirt.io/size: small
              network.kubevirt.io/headlessService: headless
          spec:
            architecture: amd64
            domain:
              cpu:
                cores: 1
                sockets: 1
                threads: 1
              devices:
                disks:
                - disk:
                    bus: virtio
                  name: rootdisk
                - disk:
                    bus: virtio
                  name: cloudinitdisk
                interfaces:
                - macAddress: 02:0d:38:00:00:00
                  masquerade: {}
                  model: virtio
                  name: default
                networkInterfaceMultiqueue: true
                rng: {}
              features:
                acpi: {}
                smm:
                  enabled: true
              firmware:
                bootloader:
                  efi: {}
              machine:
                type: pc-q35-rhel9.6.0
              memory:
                guest: 2Gi
              resources: {}
            networks:
            - name: default
              pod: {}
            terminationGracePeriodSeconds: 180
            volumes:
            - dataVolume:
                name: rhel97
              name: rootdisk
            - cloudInitNoCloud:
                userData: |-
                  #cloud-config
                  user: cloud-user
                  password: jaom-1g5t-ohlm
                  chpasswd: { expire: False }
              name: cloudinitdisk 2.
      2. Change the socket to 2 to trigger vcpu hotplug.
      3. Login the VM after successfully migration and check the cpuinfo. 
      The new cpu is offline.
      
      [root@rhel97 ~]# lscpu  
      Architecture:             x86_64
        CPU op-mode(s):         32-bit, 64-bit
        Address sizes:          46 bits physical, 57 bits virtual
        Byte Order:             Little Endian
      CPU(s):                   2
        On-line CPU(s) list:    0
        Off-line CPU(s) list:   1
      Vendor ID:                GenuineIntel
        BIOS Vendor ID:         Red Hat
        Model name:             Intel Xeon Processor (Icelake)
          BIOS Model name:      RHEL-9.6.0 PC (Q35 + ICH9, 2009)
          CPU family:           6
          Model:                134
          Thread(s) per core:   1
          Core(s) per socket:   1
          Socket(s):            1
          Stepping:             0
          BogoMIPS:             4190.15
          Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                                ca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss sys
                                call nx pdpe1gb rdtscp lm constant_tsc pebs bts rep_go
                                od nopl xtopology cpuid tsc_known_freq pni pclmulqdq d
                                tes64 vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic mov
                                be popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                                 hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd
                                 ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority
                                 ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bm
                                i2 erms invpcid avx512f avx512dq rdseed adx smap avx51
                                2ifma clflushopt clwb avx512cd sha_ni avx512bw avx512v
                                l xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat vnmi av
                                x512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmul
                                qdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rd
                                pid fsrm md_clear flush_l1d arch_capabilities
      Virtualization features:  
        Virtualization:         VT-x
        Hypervisor vendor:      KVM
        Virtualization type:    full
      Caches (sum of all):      
        L1d:                    32 KiB (1 instance)
        L1i:                    32 KiB (1 instance)
        L2:                     4 MiB (1 instance)
        L3:                     16 MiB (1 instance)
      NUMA:                     
        NUMA node(s):           1
        NUMA node0 CPU(s):      0
      Vulnerabilities:          
        Gather data sampling:   Not affected
        Itlb multihit:          Not affected
        L1tf:                   Not affected
        Mds:                    Not affected
        Meltdown:               Not affected
        Mmio stale data:        Mitigation; Clear CPU buffers; SMT Host state unknown
        Reg file data sampling: Not affected
        Retbleed:               Not affected
        Spec rstack overflow:   Not affected
        Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prct
                                l
        Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointe
                                r sanitization
        Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditiona
                                l; RSB filling; PBRSB-eIBRS Not affected; BHI SW loop,
                                 KVM SW loop
        Srbds:                  Not affected
        Tsx async abort:        Mitigation; TSX disabled
      
      4. Check the dmesg log.
      
      [   70.601402] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
      [   70.601462] clocksource:                       'kvm-clock' wd_nsec: 543445712 wd_now: 117af3fcc2 wd_last: 115a8fa9f2 mask: ffffffffffffffff
      [   70.601467] clocksource:                       'tsc' cs_nsec: 479651937 cs_now: 24b63d5eb6 cs_last: 247a57b034 mask: ffffffffffffffff
      [   70.601470] clocksource:                       'kvm-clock' (not 'tsc') is current clocksource.
      [   70.601473] tsc: Marking TSC unstable due to clocksource watchdog
      [   70.718258] ACPI: CPU1 has been hot-added
      [   70.728190] SMP alternatives: switching to SMP code
      [   70.733376] smpboot: Booting Node 0 Processor 1 APIC 0x1
      [   70.734082] kvm_intel: Inconsistent VMCS config on CPU 1
      [   70.734130] kvm: enabling virtualization on CPU1 failed
      [   70.735933] smpboot: CPU 1 is now offline
      
      
      5. Soft reboot the VM, the 2 cpu will be online. Then do vcpu hotplug again by changing the sockets to 3.
      6. Check the cpu info after migration. The new cpu can be hotpluged successfully.
      
      [   57.549015] ACPI: CPU2 has been hot-added
      [   57.565879] smpboot: Booting Node 0 Processor 2 APIC 0x2
      [   57.566014] TSC ADJUST compensate: CPU2 observed 130448479191 warp. Adjust: 130448479191
      [   57.566014] TSC ADJUST compensate: CPU2 observed 12 warp. Adjust: 130448479203
      [   57.567354] TSC synchronization [CPU#0 -> CPU#2]:
      [   57.567354] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
      [   57.626457] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
      [   57.626469] clocksource:                       'kvm-clock' wd_nsec: 519620295 wd_now: 3ec4536741 wd_last: 3ea55aa07a mask: ffffffffffffffff
      [   57.626473] clocksource:                       'tsc' cs_nsec: 467819394 cs_now: 1e6b2800fb cs_last: 1e30bcac51 mask: ffffffffffffffff
      [   57.626480] clocksource:                       'kvm-clock' (not 'tsc') is current clocksource.
      [   57.626483] tsc: Marking TSC unstable due to clocksource watchdog
      [   57.627411] Will online and init hotplugged CPU: 2

      Actual results:

      Do vcpu hotplug on a new created VM, the new cpu is offline on the VM.
      Soft reboot the VM can fix the cpu offline issue. And the following vcpu hotplug can be success after soft reboot the VM.

      Expected results:

      The vcpu hotplug on a new created VM should make the new cpu online.

      Additional info:

       

              tnisan@redhat.com Tal Nisan
              xiaodwan@redhat.com Xiaodai Wang
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: