-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.21
-
None
-
None
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Image registry disruption test fails from time to time:
[Monitor:image-registry-availability][sig-imageregistry] disruption/image-registry connection/new should be available throughout the test expand_less 0s { namespace/openshift-image-registry backend-disruption-name/image-registry-new-connections connection/new disruption/openshift-tests route/test-disruption-new was unreachable during disruption: for at least 19s (maxAllowed=6s):
Prow job: link.
As investigated in this Slack thread the issue may be related to the "[sig-node] NoExecuteTaintManager Multiple Pods [Serial] only evicts pods without tolerations from tainted nodes" test which adds a taint to 2 nodes which may happen to be the nodes where image-registry pods are scheduled. Taking into account the fact that image-registry uses 2 replicas in HA setups evicting both pods may result into longer disruptions.
Version-Release number of selected component (if applicable):
4.21
How reproducible:
Not always
Steps to Reproduce:
1. Run openshift/origin serial test multiple times.
Actual results:
Image registry disruption lasts longer than the allowed 6 seconds.
Expected results:
Image registry disruption should not last longer than the allowed 6 seconds.
Additional info:
Other components can also be impacted by the NoExecuteTaintManager test. The metrics-server disruption test flakes too (details in the Slack thread).