This is a clone of issue OCPBUGS-36222. The following is the description of the original issue:
—
Description of problem:
The AWS Cluster API Provider (CAPA) runs a required check to resolve the DNS Name for load balancers it creates. If the CAPA controller (in this case, running in the installer) cannot resolve the DNS record, CAPA will not report infrastructure ready. We are seeing in some cases, that installations running on local hosts (we have not seen this problem in CI) will not be able to resolve the LB DNS name record and the install will fail like this:
DEBUG I0625 17:05:45.939796 7645 awscluster_controller.go:295] "Waiting on API server ELB DNS name to resolve" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" namespace="openshift-cluster-api-guests" name="umohnani-4-16test-5ndjw" reconcileID="553beb3d-9b53-4d83-b417-9c70e00e277e" cluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" DEBUG Collecting applied cluster api manifests... ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure was not ready within 15m0s: client rate limiter Wait returned an error: context deadline exceeded
We do not know why some hosts cannot resolve these records, but it could be something like issues with the local DNS resolver cache, DNS records are slow to propagate in AWS, etc.
Version-Release number of selected component (if applicable):
4.16, 4.17
How reproducible:
Not reproducible / unknown -- this seems to be dependent on specific hosts and we have not determined why some hosts face this issue while others do not.
Steps to Reproduce:
n/a
Actual results:
Install fails because CAPA cannot resolve LB DNS name
Expected results:
As the DNS record does exist, install should be able to proceed.
Additional info:
Slack thread:
https://redhat-internal.slack.com/archives/C68TNFWA2/p1719351032090749
- clones
-
OCPBUGS-36222 AWS Installs Fail when Installer Host cannot resolve LB DNS Name
- Verified
- is blocked by
-
OCPBUGS-36222 AWS Installs Fail when Installer Host cannot resolve LB DNS Name
- Verified
- links to
-
RHBA-2024:10137 OpenShift Container Platform 4.17.z bug fix update