-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
None
Description of problem:
[vcpu hotplug] the cpu is offline after migration successfully
Version-Release number of selected component (if applicable):
CNV: registry-proxy.engineering.redhat.com/rh-osbs/iib:977162 OCP 4.19.0-ec.4
How reproducible:
100%
Steps to Reproduce:
1. Create a VM by following yaml config. apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: annotations: kubemacpool.io/transaction-timestamp: "2025-05-30T15:05:57.204944275Z" kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1 vm.kubevirt.io/validations: | [ { "name": "minimal-required-memory", "path": "jsonpath::.spec.domain.memory.guest", "rule": "integer", "message": "This VM requires more memory.", "min": 1610612736 } ] creationTimestamp: "2025-05-30T09:44:26Z" finalizers: - kubevirt.io/virtualMachineControllerFinalize generation: 9 labels: app: rhel97 kubevirt.io/dynamic-credentials-support: "true" vm.kubevirt.io/template: rhel9-server-small vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.34.0 name: rhel97 namespace: default resourceVersion: "53900460" uid: 82b148ed-198b-4436-acfd-214bc9067822 spec: dataVolumeTemplates: - apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: creationTimestamp: null name: rhel97 spec: source: http: url: http://<internal_server>/libvirt-CI-resources/RHEL-9.7-x86_64-latest-ovmf.qcow2 storage: accessModes: - ReadWriteMany resources: requests: storage: 16Gi storageClassName: ocs-storagecluster-cephfs volumeMode: Filesystem runStrategy: RerunOnFailure template: metadata: annotations: vm.kubevirt.io/flavor: small vm.kubevirt.io/os: rhel9 vm.kubevirt.io/workload: server creationTimestamp: null labels: kubevirt.io/domain: rhel97 kubevirt.io/size: small network.kubevirt.io/headlessService: headless spec: architecture: amd64 domain: cpu: cores: 1 sockets: 1 threads: 1 devices: disks: - disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - macAddress: 02:0d:38:00:00:00 masquerade: {} model: virtio name: default networkInterfaceMultiqueue: true rng: {} features: acpi: {} smm: enabled: true firmware: bootloader: efi: {} machine: type: pc-q35-rhel9.6.0 memory: guest: 2Gi resources: {} networks: - name: default pod: {} terminationGracePeriodSeconds: 180 volumes: - dataVolume: name: rhel97 name: rootdisk - cloudInitNoCloud: userData: |- #cloud-config user: cloud-user password: jaom-1g5t-ohlm chpasswd: { expire: False } name: cloudinitdisk 2. 2. Change the socket to 2 to trigger vcpu hotplug. 3. Login the VM after successfully migration and check the cpuinfo. The new cpu is offline. [root@rhel97 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0 Off-line CPU(s) list: 1 Vendor ID: GenuineIntel BIOS Vendor ID: Red Hat Model name: Intel Xeon Processor (Icelake) BIOS Model name: RHEL-9.6.0 PC (Q35 + ICH9, 2009) CPU family: 6 Model: 134 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4190.15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m ca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss sys call nx pdpe1gb rdtscp lm constant_tsc pebs bts rep_go od nopl xtopology cpuid tsc_known_freq pni pclmulqdq d tes64 vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic mov be popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bm i2 erms invpcid avx512f avx512dq rdseed adx smap avx51 2ifma clflushopt clwb avx512cd sha_ni avx512bw avx512v l xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat vnmi av x512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmul qdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rd pid fsrm md_clear flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 32 KiB (1 instance) L1i: 32 KiB (1 instance) L2: 4 MiB (1 instance) L3: 16 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prct l Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointe r sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditiona l; RSB filling; PBRSB-eIBRS Not affected; BHI SW loop, KVM SW loop Srbds: Not affected Tsx async abort: Mitigation; TSX disabled 4. Check the dmesg log. [ 70.601402] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 70.601462] clocksource: 'kvm-clock' wd_nsec: 543445712 wd_now: 117af3fcc2 wd_last: 115a8fa9f2 mask: ffffffffffffffff [ 70.601467] clocksource: 'tsc' cs_nsec: 479651937 cs_now: 24b63d5eb6 cs_last: 247a57b034 mask: ffffffffffffffff [ 70.601470] clocksource: 'kvm-clock' (not 'tsc') is current clocksource. [ 70.601473] tsc: Marking TSC unstable due to clocksource watchdog [ 70.718258] ACPI: CPU1 has been hot-added [ 70.728190] SMP alternatives: switching to SMP code [ 70.733376] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 70.734082] kvm_intel: Inconsistent VMCS config on CPU 1 [ 70.734130] kvm: enabling virtualization on CPU1 failed [ 70.735933] smpboot: CPU 1 is now offline 5. Soft reboot the VM, the 2 cpu will be online. Then do vcpu hotplug again by changing the sockets to 3. 6. Check the cpu info after migration. The new cpu can be hotpluged successfully. [ 57.549015] ACPI: CPU2 has been hot-added [ 57.565879] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 57.566014] TSC ADJUST compensate: CPU2 observed 130448479191 warp. Adjust: 130448479191 [ 57.566014] TSC ADJUST compensate: CPU2 observed 12 warp. Adjust: 130448479203 [ 57.567354] TSC synchronization [CPU#0 -> CPU#2]: [ 57.567354] Measured 4 cycles TSC warp between CPUs, turning off TSC clock. [ 57.626457] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 57.626469] clocksource: 'kvm-clock' wd_nsec: 519620295 wd_now: 3ec4536741 wd_last: 3ea55aa07a mask: ffffffffffffffff [ 57.626473] clocksource: 'tsc' cs_nsec: 467819394 cs_now: 1e6b2800fb cs_last: 1e30bcac51 mask: ffffffffffffffff [ 57.626480] clocksource: 'kvm-clock' (not 'tsc') is current clocksource. [ 57.626483] tsc: Marking TSC unstable due to clocksource watchdog [ 57.627411] Will online and init hotplugged CPU: 2
Actual results:
Do vcpu hotplug on a new created VM, the new cpu is offline on the VM. Soft reboot the VM can fix the cpu offline issue. And the following vcpu hotplug can be success after soft reboot the VM.
Expected results:
The vcpu hotplug on a new created VM should make the new cpu online.
Additional info:
- clones
-
CNV-62851 [Tracker Bug] [vcpu hotplug] the cpu is offline after migration successfully
-
- New
-