-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.19.0
-
Important
-
Yes
-
CORENET Sprint 269
-
1
-
Proposed
-
False
-
TRT has detected an apparent disruption regression in 4.19 micro upgrades on aws. I'm not certain how widespread it is, there's a lot going on but it's quite visible here.
The problem looks to have begun around Jan 17th, prior to this the aws micro P95 was consistently 0. Since then it's jumping as high as 6s depending on the day and lookback used. It seems to impact about 1/20 jobs, we see it around the 95th percentile and above. It's less clear below that.
We do not see the same problem in 4.18 at this time.
For this specific bug, the pattern I'm seeing is a band of disruption to new connections for all apiservers, during the network operator progressing phase of an upgrade, prior to rolling out node updates.
Sample job runs:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1886515783572393984
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1886126427712000000
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1884621809362407424
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1883866043437289472
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1882431829286326272
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade/1882431831811297280
Clear bands of api disruption during network operator progressing. I don't think this was happening prior to the 17th of Jan.
Examining job runs, there are a few patterns but the most
- duplicates
-
OCPBUGS-51154 Increased openshift-api Disruption
-
- Closed
-