Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55825

OVN-Kubernetes flows caused OVS CPU Regression in 4.19.0 RC - component readiness

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • Yes
    • None
    • Approved
    • CORENET Sprint 270
    • 1
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      So we indeed had an OVS CPU regression because of this fix: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5229 that went into downstream via https://github.com/openshift/ovn-kubernetes/pull/2516 . This was a bug fix for Virt customers https://issues.redhat.com/browse/CORENET-5389 rhn-support-asood rravaiol@redhat.com mduarted@redhat.com phoracek@redhat.com : FYI. Given the revert this means we need to handle that bug fix properly again.

      OVS CPU before this fix went in: https://drive.google.com/file/d/1SH-FZsHbCrY27Kh5uOkpDRsFPP9okEjw/view?usp=sharing

      After this fix went in: https://drive.google.com/file/d/1ZLe9qLfrU0G0etSAoH8KFvT8Lf_cQAyC/view?usp=sharing

      Unfortunately this is release blocker regression for 4.19.0, so we will revert this one commit - if not we will need to revert the whole merge that brought this in: https://github.com/openshift/ovn-kubernetes/pull/2516 which had a lot more commits that had 2 full Virt features done in release-4.19.

      Action Plan to prevent things like this from happening again:

      1. Have small team retro (thursday corenet team meeting for corenet members)
      2. Have tech debt items to have QE tests for CPU/MEM regressions on 6 node clusters
      3. Have developers also add tests to upstream in our repo
      4. Have CI flags that will tell us when our own components go red - this is something we should be knowing ourselves - we should not rely on TRT or Scale/Perf teams to tell us such bugs that too on a 6 node cluster

      Revert PR Upstream: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5229 pdiak@redhat.com let's get this in ASAP.

      Relevant debugging threads:

      1. TRT and Corenet: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1746529764895749?thread_ts=1745591284.239329&cid=C01CQA76KMX
      2. Corenet and OVS: https://redhat-internal.slack.com/archives/CDCP2LA9L/p1746521531649959?thread_ts=1746463246.242769&cid=CDCP2LA9L 

      Thanks to rh-ee-fbabcock rhn-engineering-dgoodwin pdiak@redhat.com trozet@redhat.com imaximet@redhat.com for helping here

              sseethar Surya Seetharaman
              sseethar Surya Seetharaman
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: