Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-741

Make Node Tuning Operator (including PAO controllers) optional


    • Make NTO optional
    • False
    • None
    • False
    • Red
    • To Do
    • OCPSTRAT-147 - OpenShift Optional Capabilities (Phase 3)
    • Impediment
    • OCPSTRAT-147OpenShift Optional Capabilities (Phase 3)
    • 0% To Do, 0% In Progress, 100% Done
    • Telco 5G Core

      Epic Goal

      • Investigate if NTO should be made optional as part of the composable OpenShift initiative
      • Performance / scale regression testing of cluster without default tunings 
      • Implement the desired changes in NTO to make it optional
        • See example of required [changes to manifests from samples-operator|https://github.com/openshift/cluster-samples-operator/pull/414.]
        • Look for simple workarounds for applying default tunings. The importance of this would depend on the results of the regression testing.
            • MachineConfig to lay down file in /etc/sysctl.d.  Some default tunings aren't sysctls, for instance, [selinux]
              avc_cache_threshold=8192,  and scheduler settings.
            • Baked into RHCOS image? (probably not simple enough)
      • QE testing to test the new behavior
      • Docs update to document the new functionality

      Why is this important?

      • As part of the composable OpenShift initiative, certain core operators are being asked to decide whether they want to declare themselves as optional during cluster install. Why? Give customers the ability to remove certain operators (which declare themselves as optional) that they are not likely to use giving them a much smaller resource footprint. The customers will be able to enable the optional operators post-install (but not disable). 


      1. Disable NTO during cluster creation
      2. Enable NTO after cluster creation
      3. NTO must tolerate ALL potentially optional operators being disables

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. Collaborate with the OCP Perf Scale team to run the OCP Perf/Scale release tests without NTO defualt tunings to quantify the performance impact (if any)

      Previous Work (Optional):

      Open questions::

      1. If the performance impact of not having NTO lay down the default tunings on the cluster is significant, we may not choose to go ahead with making NTO optional

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            dagray@redhat.com David Gray
            akamra8979 Ashish Kamra
            Jiri Mencak, Liquan Cui, Martin Sivak
            Liquan Cui Liquan Cui
            Andrew Taylor Andrew Taylor
            Jiri Mencak Jiri Mencak
            Jiri Mencak Jiri Mencak
            0 Vote for this issue
            14 Start watching this issue