XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • NTO
    • None
    • Containerize Tuned
    • Hide
      - POC of tuned running from a container
      - In case the POC has positive results: Write a design doc
      Show
      - POC of tuned running from a container - In case the POC has positive results: Write a design doc
    • RHOAI, Training
    • Yellow
    • False
    • False
    • None
    • 0% To Do, 33% In Progress, 67% Done
    • Hide

      6/2/2024:   POC ongoing , will be clearer after initial results

      Show
      6/2/2024:   POC ongoing , will be clearer after initial results

      Epic Goal

      • POC :Change the current implementation of NTO having TuneD running in a daemon set  To running TuneD from a container (e.g. using podman from a systemd service).
      • Write a design document  (manifested as an OCP enhancement) for the Proposal above.
      • Optional: enhance the epic to a feature or create a followup for implementation.

      Why is this important?

      • With the current implementation of NTO having TuneD running in a daemon set there are several issues:
        • Order of pods start up (OCPBUGS-26401) : In case of node reboot, all pods restart again in random order. Since there's no control on pod restart order, it is possible that TuneD pod will start after the workload pods. This means the workload pods start with partial tuning which can affect performance or even cause the workload to crash.
        • Tuned restarts / reapplication (OCPBUGS-26400) : If TuneD restarts at runtime for any reason - tuning is broken.
        • Tuning changes impact running applications (OCPBUGS-28647)
      • The proposed restructure of tuned should address these issue.

      Scenarios

      1. All node reboot should not affect the desired tuning outcome
      2. As a cluster admin I would like to apply small tuning adjustments without negatively affecting running workloads

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. Tuned & restarts -Solution Brainstorming

      Open questions:

      1. Can this be downported to previous versions ?
      2. If materialized , will this require a feature gate ?
      3. How do we keep files open, for example for the rtentsk plugin. Something new or tuned as a daemon?
      4. Will control plane upgrade (new NTO image reference) cause immediate reboot of all workers?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              yquinn@redhat.com Yanir Quinn
              yquinn@redhat.com Yanir Quinn
              Jiri Mencak, Martin Sivak
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: