Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-3840

Better pinning of the networking stack

    XMLWordPrintable

Details

    • Epic
    • Resolution: Done
    • Critical
    • openshift-4.14
    • None
    • None
    • dynamic-pinning-net
    • False
    • None
    • False
    • Hide
      - Kernel networking must be a supported use case
      - OVS must get the necessary amount of cpus either automatically or by configuration
      - All logic must support overrides for emergencies and manual tweaks
      Show
      - Kernel networking must be a supported use case - OVS must get the necessary amount of cpus either automatically or by configuration - All logic must support overrides for emergencies and manual tweaks
    • Green
    • To Do
    • TELCOSTRAT-37 - Efficient CPU resources allocation
    • 88
    • 88% 88%
    • dev-ready, doc-ready, po-ready, px-ready, qe-ready
    • Hide

      2023-02-21:

      Dev:  [YELLOW]  waiting for OVN-K changes, NTO POC PR posted

       

      Show
      2023-02-21: Dev:  [YELLOW]   waiting for OVN-K changes, NTO POC PR posted  
    • Telco 5G Core
    • ---
    • 0
    • 0

    Description

      Epic Goal

      • Figure out how to increase the traffic handling capability for kernel networking workloads on clusters that do not use all cpus for guaranteed workloads.

      Why is this important?

      • Telco Core has only handful of guaranteed pods, but a lot of burstable kernel networking services. So they need cpu partitioning, but the networking stack needs to handle pretty high traffic too.

      Scenarios

      1. High traffic over kernel and OVS with a small guaranteed pod running on the node. Reserved using the least amount of cpu threads (4/8).

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • Kernel networking must be a supported use case
      • OVS must get the necessary amount of cpus either automatically or by configuration
      • All logic must support overrides for emergencies and manual tweaks

      Dependencies (internal and external)

      1. cri-o / OCI hooks
      2. systemd / OVS slice configuration
      3. kernel / RPS mask tunables

      Previous Work (Optional):

      1. IRQ balancing https://github.com/cri-o/cri-o/blob/9ed9393df13cee1bb056be0f2068ed972e5cc05d/internal/runtimehandlerhooks/high_performance_hooks.go#L76
      2. RPS mask https://github.com/openshift/cluster-node-tuning-operator/blob/master/assets/performanceprofile/scripts/set-rps-mask.sh
      3. https://issues.redhat.com/browse/CNF-1360
      4. Dynamic OVS pinning: https://docs.google.com/document/d/18BtBkB3tHldt-zLLqNWA94JSd7aDUGnd1XwmYElde4E/edit

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

      Attachments

        Activity

          People

            rravaiol@redhat.com Riccardo Ravaioli
            msivak@redhat.com Martin Sivak
            Ross Brattain Ross Brattain
            Ronan Hennessy Ronan Hennessy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: