Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.15, 4.16
Component/s: Networking / DNS
Labels:
- ne-triaged
- stale

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
0
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
NI&D Sprint 268
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem

CI suites running on test-platform build clusters are having trouble reliably resolving DNS for cluster-under-test resources, causing CI failures. The symptoms seem to be distributed among the build clusters over the past day:

$ curl -s 'https://search.dptools.openshift.org/search?maxAge=24h&type=build-log&context=0&search=dial+tcp:+lookup+api.*on+172.30.0.10:53:+no+such+host&search=Using+namespace' | jq -r 'to_entries[].value | select(length > 1)["Using namespace"][].context[]' | sed 's/.*\(build[0-9]*\).*/\1/' | sort | uniq -c
      2 build01
      4 build02
     10 build03
      2 build04
     11 build05
      8 build09

and those build clusters are mostly 4.15 and 4.16. I'm not entirely clear if this is an in-cluster-DNS-component issue, or SDN/OVN-networking issue, or an external-to-the-cluster-DNS issue, or what. Debugging assistance welcome

Version-Release number of selected component (if applicable)

The version of the cluster-under-test does not seem relevant, but the build clusters seeing the issue are mostly recent 4.15 and 4.16.

How reproducible

A few dozen hits per day out of thousands of CI runs, so a low rate. But still high enough to be causing Component Readiness issues.

Steps to Reproduce

Unclear.

Actual results

Occasional DNS-resolution attempts for cluster-under-test resources fail, causing the CI run to fail, presumably because of some kind of DNS instability biting the test pod running on the build cluster.

Expected results

Reliable DNS for CI pods running on build clusters.

Additional info

Recent changes in managed-cluster-config#2158 and release#54210 have returned build clusters to stock dns-default tolerations, but that does not seem to have resolved the issue.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot 2024-07-16 3.37.50 PM.png
2024/07/16 11:08 PM
81 kB
W. Trevor King

is related to

OCPBUGS-39580 clusteroperator/console: unexpected state transitions during e2e test run

ASSIGNED

Assignee:: Miciah Masters

Reporter:: W. Trevor King

QA Contact:: Hongan Li

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/07/16 10:55 PM

Updated:: 2025/10/11 7:42 AM

Resolved:: 2025/03/21 12:46 AM

Details

Description

Description of problem

Version-Release number of selected component (if applicable)

How reproducible

Steps to Reproduce

Actual results

Expected results

Additional info

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates