-
Epic
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
Containerize Tuned
-
-
RHOAI, Training
-
Yellow
-
False
-
False
-
None
-
0% To Do, 33% In Progress, 67% Done
Epic Goal
- POC :Change the current implementation of NTO having TuneD running in a daemon set To running TuneD from a container (e.g. using podman from a systemd service).
- Write a design document (manifested as an OCP enhancement) for the Proposal above.
- Optional: enhance the epic to a feature or create a followup for implementation.
Why is this important?
- With the current implementation of NTO having TuneD running in a daemon set there are several issues:
- Order of pods start up (OCPBUGS-26401) : In case of node reboot, all pods restart again in random order. Since there's no control on pod restart order, it is possible that TuneD pod will start after the workload pods. This means the workload pods start with partial tuning which can affect performance or even cause the workload to crash.
-
- Tuned restarts / reapplication (
OCPBUGS-26400) : If TuneD restarts at runtime for any reason - tuning is broken.
- Tuned restarts / reapplication (
-
- Tuning changes impact running applications (OCPBUGS-28647)
- The proposed restructure of tuned should address these issue.
Scenarios
- All node reboot should not affect the desired tuning outcome
- As a cluster admin I would like to apply small tuning adjustments without negatively affecting running workloads
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
Open questions:
- Can this be downported to previous versions ?
- If materialized , will this require a feature gate ?
- How do we keep files open, for example for the rtentsk plugin. Something new or tuned as a daemon?
- Will control plane upgrade (new NTO image reference) cause immediate reboot of all workers?
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is related to
-
PSAP-1271 Investigate IPIs happening on TuneD daemon reload
- Closed
-
RHEL-26157 RFE: TuneD "profiles" subdirectory or make the profile path configurable
- Closed
- links to