Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18.0
Component/s: Networking / DNS
Labels:
- edge-payload
- ne-triaged

Severity:
Important
Regression:
None
Sprint:
NE Sprint 260, NE Sprint 261, NE Sprint 262, NE Sprint 263, NE Sprint 264, NE Sprint 265
sprint_count:
6
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

We noticed an increase in DNS failures for SNO upgrades, this seems to be a regression since the same error rate was not present in 4.17 and 4.16

We are now passing at 92-93% on SNO in 4.18

4.18 Test Pass Rate 92~93

4.17 Pass Rate 98~99

Version-Release number of selected component (if applicable):

How reproducible:

    This error is happening frequently enough in our CI micro runs.

Ex:
CI Search Link: https://search.dptools.openshift.org/?search=Verify+DNS+availability+during+and+after+upgrade+success&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Job Failure Sample: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-29136-ci-4.18-e2e-aws-upgrade-ovn-single-node/1842063899902349312

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Loki Query for Sample Run: https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%7B%22datasource%22:%22PCEB727DF2F34084E%22,%22queries%22:%5B%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fopenshift-origin-29136-ci-4.18-e2e-aws-upgrade-ovn-single-node%2F1842063899902349312%5C%22%7D%20%7C%20unpack%20%7C%20pod%3D%5C%22dns-test-8ea409cd-2658-473e-9215-5579a4716412-cg47f%5C%22%20or%20namespace%3D%5C%22openshift-dns%5C%22%22,%22refId%22:%22A%22,%22editorMode%22:%22code%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221727988043189%22,%22to%22:%221728024043191%22%7D%7D

Note that we do see logs intermixed in Loki if you look at the timestamps of when Loki received them and when the pod logs the time, this seems to be due to them being ingested later, this seems to be a side effect of running log collection on SNO during upgrades.

is related to

OCPBUGS-48630 Metal upgrade jobs unable to mirror images prior to testing

ASSIGNED

OCPBUGS-44970 Loki on SNO throws excessive restarts while waiting for DNS deployment

Closed

is triggered by

OCPBUGS-43059 SNO Connection Error During Upgrades

Closed

relates to

OCPBUGS-45071 SNO upgrade can fail on [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

ASSIGNED

OCPBUGS-44970 Loki on SNO throws excessive restarts while waiting for DNS deployment

Closed

Assignee:: Ali Syed

Reporter:: Egli Hila

QA Contact:: Ishmam Amin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/10/04 2:06 PM

Updated:: 2025/01/30 12:46 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates