Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-741

Make Node Tuning Operator (including PAO controllers) optional

XMLWordPrintable

    • Strategic Product Work
    • Make NTO optional
    • OCPSTRAT-147OpenShift Optional Capabilities (Phase 3)
    • Red
    • False
    • False
    • None
    • 0% To Do, 0% In Progress, 100% Done

      Epic Goal

      • Investigate if NTO should be made optional as part of the composable OpenShift initiative
      • Performance / scale regression testing of cluster without default tunings 
      • Implement the desired changes in NTO to make it optional
        • See example of required [changes to manifests from samples-operator|https://github.com/openshift/cluster-samples-operator/pull/414.]
        • Look for simple workarounds for applying default tunings. The importance of this would depend on the results of the regression testing.
            • MachineConfig to lay down file in /etc/sysctl.d.  Some default tunings aren't sysctls, for instance, [selinux]
              avc_cache_threshold=8192,  and scheduler settings.
            • Baked into RHCOS image? (probably not simple enough)
      • QE testing to test the new behavior
      • Docs update to document the new functionality

      Why is this important?

      • As part of the composable OpenShift initiative, certain core operators are being asked to decide whether they want to declare themselves as optional during cluster install. Why? Give customers the ability to remove certain operators (which declare themselves as optional) that they are not likely to use giving them a much smaller resource footprint. The customers will be able to enable the optional operators post-install (but not disable). 

      Scenarios

      1. Disable NTO during cluster creation
      2. Enable NTO after cluster creation
      3. NTO must tolerate ALL potentially optional operators being disables

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. Collaborate with the OCP Perf Scale team to run the OCP Perf/Scale release tests without NTO defualt tunings to quantify the performance impact (if any)

      Previous Work (Optional):

      Open questions::

      1. If the performance impact of not having NTO lay down the default tunings on the cluster is significant, we may not choose to go ahead with making NTO optional

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

          There are no Sub-Tasks for this issue.

              dagray@redhat.com David Gray
              akamra8979 Ashish Kamra
              Jiri Mencak, Liquan Cui, Martin Sivak
              Liquan Cui Liquan Cui
              Andrew Taylor Andrew Taylor
              Jiri Mencak Jiri Mencak
              Jiri Mencak Jiri Mencak
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: