-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
CNV Virt-Node Sprint 266, CNV Virt-Node Sprint 267, CNV Virt-Node Sprint 268, CNV Virt-Node Sprint 269, CNV Virt-Node Sprint 270, CNV Virt-Node Sprint 271, CNV Virt-Node Sprint 272, CNV Virt-Node Sprint 273, CNV Virt-Node Sprint 274, CNV Virt-Node Sprint 275, CNV Virt-Node Sprint 276, CNV Virt-Node Sprint 277
-
None
Goal
`/proc/sys/kernel/sched_schedstats` to be set to 1 by kubevirt (likey handler, no need to do with kargs) by default.
As a cluster admin (user requested in CNV-10588) I can look at the vCPU wait metric in order to understand if my VM is blocked by waiting for IO to return. Thus there are wasted compute resources, because data was not available for some reas. Usually we do not want this to happen.
vCPU wait metric requires `/proc/sys/kernel/sched_schedstats` to be enabled. By default it is off, because there is a small performance penalty in certain benchmarks, see https://bugzilla.redhat.com/show_bug.cgi?id=1936540.
And we have fixed CNV-13219 in the past in order to enable it if needed.
However, today the UI is always displaying a widget for vCPU wait at https://console-openshift-console.apps.<cluster>/monitoring/dashboards/grafana-dashboard-kubevirt-top-consumers.
Because
- it is shown by default
- and because the performance penalty is only noticeable in synthetic benchmarks
- and because this is valuable information (we can have an alert for it)
- and because this can be enabled on a procfs (and not only kernel) level
I'm proposing to enable this setting by default via handler or some CNV owned component.
User Stories
- As a cluster admin, I want get the vcpu wait metric, so that I get a warning if my cluster is not behaving well
- another user story
Non-Requirements
- List of things not included in this epic, to alleviate any doubt raised during the grooming process.
Notes
- With swap getting enabled, this might become even more relevant.
- virt-handler can enable this, as it's configurable via sysfs/procfs
- relates to
-
OCPBUGS-62301 Evaluation of platform default kernel psi argument impact and Kube Descheduler Guidance
-
- New
-
1.
|
upstream roadmap issue |
|
New | |
Unassigned |
2.
|
upstream design |
|
New | |
Unassigned |
3.
|
upstream documentation |
|
New | |
Unassigned |
4.
|
upgrade consideration |
|
New | |
Unassigned |
5.
|
CEE/PX summary presentation |
|
Closed | |
Unassigned |
6.
|
test plans in polarion |
|
New | |
Unassigned |
7.
|
automated tests |
|
New | |
Unassigned |
8.
|
downstream documentation merged |
|
New | |
Unassigned |