Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-44186

Enable schedstats for vCPU wait metrics by default

XMLWordPrintable

    • CNV Virt-Node Sprint 266, CNV Virt-Node Sprint 267, CNV Virt-Node Sprint 268, CNV Virt-Node Sprint 269, CNV Virt-Node Sprint 270, CNV Virt-Node Sprint 271, CNV Virt-Node Sprint 272, CNV Virt-Node Sprint 273, CNV Virt-Node Sprint 274, CNV Virt-Node Sprint 275, CNV Virt-Node Sprint 276, CNV Virt-Node Sprint 277
    • None

      Goal

      `/proc/sys/kernel/sched_schedstats` to be set to 1 by kubevirt (likey handler, no need to do with kargs) by default.

      As a cluster admin (user requested in CNV-10588) I can look at the vCPU wait metric in order to understand if my VM is blocked by waiting for IO to return. Thus there are wasted compute resources, because data was not available for some reas. Usually we do not want this to happen.

      vCPU wait metric requires `/proc/sys/kernel/sched_schedstats` to be enabled. By default it is off, because there is a small performance penalty in certain benchmarks, see https://bugzilla.redhat.com/show_bug.cgi?id=1936540.
      And we have fixed CNV-13219 in the past in order to enable it if needed.

      However, today the UI is always displaying a widget for vCPU wait at https://console-openshift-console.apps.<cluster>/monitoring/dashboards/grafana-dashboard-kubevirt-top-consumers.

      Because

      • it is shown by default
      • and because the performance penalty is only noticeable in synthetic benchmarks
      • and because this is valuable information (we can have an alert for it)
      • and because this can be enabled on a procfs (and not only kernel) level

      I'm proposing to enable this setting by default via handler or some CNV owned component.

      User Stories

      • As a cluster admin, I want get the vcpu wait metric, so that I get a warning if my cluster is not behaving well
      • another user story

      Non-Requirements

      • List of things not included in this epic, to alleviate any doubt raised during the grooming process.

      Notes

      • With swap getting enabled, this might become even more relevant.
      • virt-handler can enable this, as it's configurable via sysfs/procfs

          1.
          upstream roadmap issue Sub-task New Normal Unassigned
          2.
          upstream design Sub-task New Normal Unassigned
          3.
          upstream documentation Sub-task New Normal Unassigned
          4.
          upgrade consideration Sub-task New Normal Unassigned
          5.
          test plans in polarion Sub-task New Normal Unassigned
          6.
          automated tests Sub-task New Normal Unassigned
          7.
          downstream documentation merged Sub-task New Normal Unassigned

              ffossemo@redhat.com Federico Fossemo
              fdeutsch@redhat.com Fabian Deutsch
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: