Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-945

Expose the new TuneD interface for dynamic tuning

XMLWordPrintable

    • Expose the new TuneD interface for dynamic tuning
    • Not Selected
    • False
    • False
    • None
    • 0% To Do, 0% In Progress, 100% Done

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      <--- Cut-n-Paste the entire contents of this description into your new Epic --->

      Epic Goal

      • Expose the new TuneD socket interface (1, 2) via NTO to allow dynamic per-cpu power-management tuning.
      • Provide a PoC only with TuneD still running within a container or TuneD running on the host via the osImageURL until the in-cluster builds MCO dependency is ready.

      Why is this important?

      • Currently, the power management and other dynamic tuning is limited and performed via CRI-O annotations: cpu-c-states.crio.io, cpu-freq-governor.crio.io, irq-load-balancing.crio.io and cpu-balancing.crio.io CRI-O is getting overloaded by these annotations and extended in a way that makes it a superset of the Container Runtime Interface which makes it very vendor specific.  We want to remedy this situation and centralize the tuning with a component that is designed for that purpose – TuneD.

      Scenarios

      1. ...

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. Expose TuneD API to the Unix Domain Socket
      2. Extend TuneD API to allow hotplugging/deplugging devices from the plugin instances at runtime
      3. Cpus assigned hook point in CRI-O
      4. Optional/nice-to-have: MCO in-cluster builds; MCO-375, MCO-306, MCO-307
      5. Optional/nice-to-have: RFE: User-configured modification of system boot-related files

      Previous Work (Optional):

      1. NTO and OpenShift CoreOS Layering
      2.  

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              jmencak Jiri Mencak
              jmencak Jiri Mencak
              Liquan Cui Liquan Cui
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: