-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.18.0
-
Quality / Stability / Reliability
-
False
-
-
2
-
Important
-
None
-
None
-
None
-
Rejected
-
NE Sprint 260, NE Sprint 261, NE Sprint 262, NE Sprint 263, NE Sprint 264, NE Sprint 265, NI&D Sprint 267
-
7
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
We noticed an increase in DNS failures for SNO upgrades, this seems to be a regression since the same error rate was not present in 4.17 and 4.16
We are now passing at 92-93% on SNO in 4.18
Version-Release number of selected component (if applicable):
How reproducible:
This error is happening frequently enough in our CI micro runs.
Job Failure Sample: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-29136-ci-4.18-e2e-aws-upgrade-ovn-single-node/1842063899902349312
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
Note that we do see logs intermixed in Loki if you look at the timestamps of when Loki received them and when the pod logs the time, this seems to be due to them being ingested later, this seems to be a side effect of running log collection on SNO during upgrades.
- is related to
-
OCPBUGS-59159 DNS availability for SNO is at ~98.8 instead of the required 99%
-
- Verified
-
-
OCPBUGS-44970 Loki on SNO throws excessive restarts while waiting for DNS deployment
-
- Closed
-
-
OCPBUGS-48630 Metal jobs often unable to mirror images prior to testing
-
- Closed
-
- is triggered by
-
OCPBUGS-43059 SNO Connection Error During Upgrades
-
- Closed
-
- relates to
-
OCPBUGS-45071 SNO upgrade can fail on [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers
-
- New
-
-
OCPBUGS-44970 Loki on SNO throws excessive restarts while waiting for DNS deployment
-
- Closed
-