Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35347

ovs-vswitchd is using isolated cpu pool instead of reserved pool

XMLWordPrintable

    • -
    • Important
    • No
    • CNF Network Sprint 256
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Open vSwitch (OVS) pinning procedure set the CPU affinity of the main thread, but other CPU threads did not pick up this affinity if they had already been created. As a consequence, some OVS threads did not run on the correct CPU set, which might interfere with the performance of pods with a Quality of Service (QoS) class of `Guaranteed`. With this update, the OVS pinning procedure updates the affinity of all the OVS threads, ensuring that all OVS threads run on the correct CPU set. (link:https://issues.redhat.com/browse/OCPBUGS-35347[*OCPBUGS-35347*])
      Show
      * Previously, the Open vSwitch (OVS) pinning procedure set the CPU affinity of the main thread, but other CPU threads did not pick up this affinity if they had already been created. As a consequence, some OVS threads did not run on the correct CPU set, which might interfere with the performance of pods with a Quality of Service (QoS) class of `Guaranteed`. With this update, the OVS pinning procedure updates the affinity of all the OVS threads, ensuring that all OVS threads run on the correct CPU set. (link: https://issues.redhat.com/browse/OCPBUGS-35347 [* OCPBUGS-35347 *])
    • Bug Fix
    • Done
    • Hide
      2024-07-03: 4.17 fix merged. Waiting for QE validation.

      2024-07-03: u/s fix merged, downstream 4.17 PR open

      2024-09-28: Upstream fix under review from SDN team

      2024-06-19: Still trying to confirm the issue and the specifics of the report
      Show
      2024-07-03: 4.17 fix merged. Waiting for QE validation. 2024-07-03: u/s fix merged, downstream 4.17 PR open 2024-09-28: Upstream fix under review from SDN team 2024-06-19: Still trying to confirm the issue and the specifics of the report

      Description of problem:

      OCP/RHCOS system daemon(s) like ovs-vswitchd (revalidator process) use the same vCPU (from isolated vCPU pool) that is already reserved by CPU Manager for CNF workloads, causing intermittent issues for CNF workloads performance (and also causing vCPU level overload). Note: NCP 23.11 uses CPU Manager with static policy and Topology Manager set to "single-numa-node". Also, specific isolated and reserved vCPU pools have been defined.

      Version-Release number of selected component (if applicable):

      4.14.22

      How reproducible:

      Intermittent at customer environment.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      ovs-vswitchd is using isolated CPUs

      Expected results:

      ovs-vswitchd to use only  reserved CPUs

      Additional info:

      We want to understand if customer is hitting the bug:
      
        https://issues.redhat.com/browse/OCPBUGS-32407
      
      This bug was fixed at 4.14.25. Customer cluster is 4.14.22. Customer is also asking if it is possible to get a private fix since they cannot update at the moment.
      
      All case files have been yanked at both US and EU instances of Supportshell. In case case updates or attachments are not accessible please let me know.

              apanatto@redhat.com Andrea Panattoni
              rh-ee-kyildiri Kursad Yildirim
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: