Loading...

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: CNV v4.14.0
Affects Version/s: None
Component/s: CNV Virtualization
Labels:
- cnv-4?
- cnvbugsm
- devel_ack?
- pm_ack?
- qa_ack?
- qe_test_coverage?

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2192636
Bugzilla Bug:
RHBZ: 2192636
[QE] How to address?:
---
[QE] Why QE missed?:
---
Intelligence Requested:
Market:

Severity:
High

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:
We are seeing high %steal CPU time in a VM running in Openshift Virtualization configured with dedicated resources.

Version-Release number of selected component (if applicable):
Openshift 4.12.14
OpenShift Virtualization 4.12.2

How reproducible:
Always

Steps to Reproduce:
I've configured the environment using this article without the real-time part: https://access.redhat.com/solutions/7007632

1. Label the worker MachineConfigPool with custom-kubelet=cpumanager-enabled
2. Create a KubeletConfig:

```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: cpumanager-enabled
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: cpumanager-enabled
kubeletConfig:
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 5s
```

3. Create a PerformanceProfile:

```
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: "5-15"
reserved: "0-4"
globallyDisableIrqLoadBalancing: true
hugepages:
defaultHugepagesSize: "1G"
pages:

size: "1G"
count: 3
node: 0
numa:
topologyPolicy: single-numa-node
nodeSelector:
node-role.kubernetes.io/worker: ""
```

4. Create VM with 2 CPUs, 2 GB memory. These are the relevant configurations:

```
apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
template:
domain:
cpu:
cores: 1
dedicatedCpuPlacement: true
isolateEmulatorThread: true
model: host-passthrough
numa:
guestMappingPassthrough: {}
sockets: 2
threads: 1
devices:
autoattachGraphicsDevice: false
autoattachMemBalloon: false
autoattachSerialConsole: true
ioThreadsPolicy: auto
machine:
type: pc-q35-rhel8.6.0
memory:
hugepages:
pageSize: 1Gi
resources:
limits:
memory: 2Gi
requests:
memory: 2Gi
~~~

5. Run a CPU-intensive load in the guest. I have tested running 2 `openssl speed` commands each one pinned to a vCPU:

```
for cpu in $(seq 0 1); do taskset -c "${cpu}" openssl speed >/dev/null 2>&1 & done
```

Actual results:
Using top I see a consistent high steal time in the guest, between 10% and 30%:

```
%Cpu0 : 71.1 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 28.9 st
%Cpu1 : 72.3 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 27.7 st
```

I don't see steal time in another VM with 2 CPUs, 2 GB of memory running the same workload but configured with the default configs (no dedicatedCpuPlacement, no limits, no hugepages, etc).

Expected results:
No steal time.

Additional info:
In the virt-launcher pod I confirm that the vCPUs are pinned to a pCPUs 5 and 6:

```
<cputune>
<vcpupin vcpu='0' cpuset='5'/>
<vcpupin vcpu='1' cpuset='6'/>
<emulatorpin cpuset='7'/>
<iothreadpin iothread='1' cpuset='7'/>
</cputune>
```

The cpumask for CPU 5 is 20:

```
$ python3 -c 'cpu=5; x=str("%x" % (1<<cpu)); print(",".join(x[i-8 if i>8 else 0:i] for i in reversed(range(len, 0, -8))))'
20
```

In the node where the VM is running I run a trace for the sched_switch and workqueue_execute_start events in CPU 5:

```

cd /sys/kernel/debug/tracing/
echo 20 > tracing_cpumask
echo > set_event
echo sched_switch >> set_event
echo workqueue_execute_start >> set_event
(wait 30 seconds)
echo > set_event
cat trace

tracer: nop
#
_-----=> irqs-off
/ _----=> need-resched
/ _---=> hardirq/softirq
/ _--=> preempt-depth
/ delay
TASK-PID CPU# |||| TIMESTAMP FUNCTION
CPU 0/KVM-30764 [005] d... 2766.297106: sched_switch: prev_comm=CPU 0/KVM prev_pid=30764 prev_prio=120 prev_state=R+ ==> next_comm=swapper/5 next_pid=0 next_prio=120
<idle>-0 [005] d... 2766.540965: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=CPU 0/KVM next_pid=30764 next_prio=120
CPU 0/KVM-30764 [005] d... 2767.321081: sched_switch: prev_comm=CPU 0/KVM prev_pid=30764 prev_prio=120 prev_state=R+ ==> next_comm=swapper/5 next_pid=0 next_prio=120
<idle>-0 [005] d... 2767.641028: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=CPU 0/KVM next_pid=30764 next_prio=120
CPU 0/KVM-30764 [005] d... 2768.345137: sched_switch: prev_comm=CPU 0/KVM prev_pid=30764 prev_prio=120 prev_state=R+ ==> next_comm=swapper/5 next_pid=0 next_prio=120
<idle>-0 [005] d... 2768.741041: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=CPU 0/KVM next_pid=30764 next_prio=120
CPU 0/KVM-30764 [005] d... 2769.369178: sched_switch: prev_comm=CPU 0/KVM prev_pid=30764 prev_prio=120 prev_state=R+ ==> next_comm=swapper/5 next_pid=0 next_prio=120
<idle>-0 [005] d... 2769.741063: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=CPU 0/KVM next_pid=30764 next_prio=120
```

We can see that the CPU is scheduled to idle even if the vCPU is running (R+ state)

duplicates

CNV-28792 [2203291] kubevirt should allow runtimeclass to be configured in a pod

Closed

external trackers

Red Hat Customer Portal 03473294

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates