-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
This regression appears to only affect these four backends:
openshift-api-new-connections
oauth-api-new-connections
cache-openshift-api-new-connections
cache-oauth-api-new-connections
All four will experience significant disruption (over 5 minutes), and no others will show the same pattern.
For whatever reason, kube-api does not seem affected. Nor does sdn, or micro upgrades.
What do openshift-api and oauth-api have in common?
The problem appears to have started on Sep 6. It is quite rare, only really visible at the daily 95th and 99th percentile. The next lowest we check if 75th and 50th and they do not show this regression. When it happens the pattern is fairly consistent.
This graph shows the problem appearing on September 6th, viewing most recent runs you can see the small number of times this bug has appeared since.
Some sample runs:
Unfortunately, I think I found two hits back in 4.13, but only two.
Filtering down to just one backend to eliminate duplicates, we have 7 4.14 jobs that have experienced this since Sep 6.
Sep 6 - 3 jobs
Sep 17 - 1 job
Sep 19 - 2 jobs
Sep 20 - 1 job
A failed payload from Sept 6 gives insight into the changes that were landing around that time: https://sippy.dptools.openshift.org/sippy-ng/release/4.14/tags/4.14.0-0.nightly-2023-09-06-235710/pull_requests
There are a number of ovn related changes in the 33 PRs in that payload.
- relates to
-
TRT-1261 Investigate gcp-ovn-rt-upgrade pass rates
-
- Closed
-