-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.19, 4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
Yes
-
None
-
Approved
-
CORENET Sprint 270
-
1
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
So we indeed had an OVS CPU regression because of this fix: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5229 that went into downstream via https://github.com/openshift/ovn-kubernetes/pull/2516 . This was a bug fix for Virt customers https://issues.redhat.com/browse/CORENET-5389 rhn-support-asood rravaiol@redhat.com mduarted@redhat.com phoracek@redhat.com : FYI. Given the revert this means we need to handle that bug fix properly again.
OVS CPU before this fix went in: https://drive.google.com/file/d/1SH-FZsHbCrY27Kh5uOkpDRsFPP9okEjw/view?usp=sharing
After this fix went in: https://drive.google.com/file/d/1ZLe9qLfrU0G0etSAoH8KFvT8Lf_cQAyC/view?usp=sharing
Unfortunately this is release blocker regression for 4.19.0, so we will revert this one commit - if not we will need to revert the whole merge that brought this in: https://github.com/openshift/ovn-kubernetes/pull/2516 which had a lot more commits that had 2 full Virt features done in release-4.19.
Action Plan to prevent things like this from happening again:
- Have small team retro (thursday corenet team meeting for corenet members)
- Have tech debt items to have QE tests for CPU/MEM regressions on 6 node clusters
- Have developers also add tests to upstream in our repo
- Have CI flags that will tell us when our own components go red - this is something we should be knowing ourselves - we should not rely on TRT or Scale/Perf teams to tell us such bugs that too on a 6 node cluster
Revert PR Upstream: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5229 pdiak@redhat.com let's get this in ASAP.
Relevant debugging threads:
- TRT and Corenet: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1746529764895749?thread_ts=1745591284.239329&cid=C01CQA76KMX
- Corenet and OVS: https://redhat-internal.slack.com/archives/CDCP2LA9L/p1746521531649959?thread_ts=1746463246.242769&cid=CDCP2LA9L
Thanks to rh-ee-fbabcock rhn-engineering-dgoodwin pdiak@redhat.com trozet@redhat.com imaximet@redhat.com for helping here
- clones
-
OCPBUGS-55824 OVN-Kubernetes flows caused OVS CPU Regression in 4.19.0 RC - component readiness
-
- Verified
-
- depends on
-
OCPBUGS-55824 OVN-Kubernetes flows caused OVS CPU Regression in 4.19.0 RC - component readiness
-
- Verified
-
- is blocked by
-
OCPBUGS-55824 OVN-Kubernetes flows caused OVS CPU Regression in 4.19.0 RC - component readiness
-
- Verified
-
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update