-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.14.0
-
Important
-
No
-
Proposed
-
False
-
In the above graph we can see that sometime shortly after Aug 11, disruption spiked severely for new and reused connections, to all ingress related backends.
Expanding the Most Recent Job Runs panel on the above link shows that all the bad results are coming from periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade. This job is seeing 100-600s of disruption, whereas the normal non-multi-arch job is typically 0-1s.
Two sample jobs:
Expanding the spyglass chart, we see the disruption happens at roughly the same time for all backends and lasts minutes.
Using a spyglass search string of "disrupt|OVN|ovn", it's possible this is OVN struggling at this time? There are alerts around this time. It's curious that only ingress related backends are showing disruption however, the apiservers all seem ok.