Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.19
Component/s: Networking / router
Labels:

Severity:
Low
Regression:
None
Sprint:
NE Sprint 265, NI&D Sprint 266
sprint_count:
2
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.19.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The e2e-ibmcloud-operator presubmit job for the cluster-ingress-operator repo introduced in https://github.com/openshift/release/pull/56785 always fails due to DNS. Note that this job has `always_run: false` and `optional: true` so it requires calling /test e2e-ibmcloud-operator on a PR to make it appear. These failures are not blocking any PRs from merging. Example failure.

The issue is that IBM Cloud has DNS propagation issues, similar to the AWS DNS issues (~~OCPBUGS-14966~~), except:

There isn't a way to adjust the IBMCloud DNS SOA TTL because IBMCloud DNS is managed by a 3rd party (cloudflare I think, slack ref).
Our AWS E2E tests run on AWS test runner clusters; whereas our IBMCloud E2E test run on the same AWS test runner clusters (DNS resolution isn't as reliable in AWS test runner cluster for IBM Cloud DNS names)

The PR https://github.com/openshift/cluster-ingress-operator/pull/1164 was an attempt at fixing the issue by both resolving the DNS name inside of the cluster and allowing for a couple minute "warmup" interval to avoid negative caching. I found (via https://github.com/openshift/cluster-ingress-operator/pull/1132) that the SOA TTL is ~30 minutes, which if you trigger negative caching, you will have to wait 30 minutes for the IBM DNS Resolver to refresh the DNS name.

However, I found that if you wait ~7 minutes for the DNS record to propagate and don't query the DNS name, it will work after that 7 minute wait (I call it the "warmup" period).

The tests affected are any tests that use a DNS name (wildcard or load balancer record):

TestManagedDNSToUnmanagedDNSIngressController
TestUnmanagedDNSToManagedDNSIngressController
TestUnmanagedDNSToManagedDNSInternalIngressController
TestConnectTimeout

The two paths I can think of are:

Continue https://github.com/openshift/cluster-ingress-operator/pull/1164 and adjust the warm up time to 7+ minutes
Or just skip these tests for IBM Cloud (admit we can't use IBMCloud DNS records in testing)

Version-Release number of selected component (if applicable):

4.19

How reproducible:

90-100%

Steps to Reproduce:

    1. Run /test e2e-ibmcloud-operator

Actual results:

    Tests are flakey

Expected results:

    Tests should work reliably

Additional info:

relates to

OCPBUGS-42045 Ingress Operator is lacking IBM cloud E2E testing

Verified

OCPBUGS-14966 Route 53 DNS Record are taking a long time to propagate to CI clusters

Closed

links to

openshift/cluster-ingress-operator#1164: OCPBUGS-48780: Fix IBMCloud DNS Propagation Issues in E2E

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

Assignee:: Grant Spence

Reporter:: Grant Spence

QA Contact:: Ishmam Amin

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/01/23 1:54 AM

Updated:: 2025/02/17 6:14 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates