Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-944

4.13: Address the technical debt carried from previous NTO releases


    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • openshift-4.13
    • None
    • NTO
    • 4.13: Address the technical debt carried from previous NTO releases
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 0% To Do, 0% In Progress, 100% Done

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      <--- Cut-n-Paste the entire contents of this description into your new Epic --->

      Epic Goal

      • Add a way to either 1) block TuneD FDP releases based on NTO CI; 2) periodically test upstream TuneD against NTO and PAO e2e tests; 3) improve upstrem prior-to-merge TuneD test coverage; 4) create an on-demand automated workflow for running NTO / PAO tests with latest upstream TuneD.
      • Address the issue of potential continous reboots when machines with different CPU count are in the same MCP.  See:  https://github.com/openshift/enhancements/pull/1213#discussion_r1082275685, https://issues.redhat.com/browse/OCPBUGS-646
      • TuneD should report sysctls which are set in profiles but overriden (re-applied) by processing system sysctl configuration files.
      • Bump Ginkgo to v2 as v1 is unsupported now.
      • HyperShift: Make guest-cluster CRs as "read-only" as possible
      • NTO API review.  Optionally increasing verbosity for openshift-tuned and potentially decouple the configuration for the operator and operands.

      Why is this important?

      • Starting with OCP 4.10, TuneD no longer ships as part of the operator repository due to the "bill of materials" see PSAP-512 for more detail.  As these two components separated, they are no longer tested as one unit and therefore changes in one can break the other.  While we have nightly NTO testing, this will only catch a problem after the fact rather than proactively before we switch to a new FDP release.  This needs to be addressed.
      • Ginkgo v1 is unsupported.


      1. ...

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            jmencak Jiri Mencak
            jmencak Jiri Mencak
            Liquan Cui Liquan Cui
            0 Vote for this issue
            3 Start watching this issue