-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.16.0, 4.17.0
Description of problem:
The AWS Cluster API Provider (CAPA) runs a required check to resolve the DNS Name for load balancers it creates. If the CAPA controller (in this case, running in the installer) cannot resolve the DNS record, CAPA will not report infrastructure ready. We are seeing in some cases, that installations running on local hosts (we have not seen this problem in CI) will not be able to resolve the LB DNS name record and the install will fail like this:
DEBUG I0625 17:05:45.939796 7645 awscluster_controller.go:295] "Waiting on API server ELB DNS name to resolve" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" namespace="openshift-cluster-api-guests" name="umohnani-4-16test-5ndjw" reconcileID="553beb3d-9b53-4d83-b417-9c70e00e277e" cluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" DEBUG Collecting applied cluster api manifests... ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure was not ready within 15m0s: client rate limiter Wait returned an error: context deadline exceeded
We do not know why some hosts cannot resolve these records, but it could be something like issues with the local DNS resolver cache, DNS records are slow to propagate in AWS, etc.
Version-Release number of selected component (if applicable):
4.16, 4.17
How reproducible:
Not reproducible / unknown -- this seems to be dependent on specific hosts and we have not determined why some hosts face this issue while others do not.
Steps to Reproduce:
n/a
Actual results:
Install fails because CAPA cannot resolve LB DNS name
Expected results:
As the DNS record does exist, install should be able to proceed.
Additional info:
Slack thread:
https://redhat-internal.slack.com/archives/C68TNFWA2/p1719351032090749
- blocks
-
OCPBUGS-44231 AWS Installs Fail when Installer Host cannot resolve LB DNS Name
- Closed
- is cloned by
-
OCPBUGS-44231 AWS Installs Fail when Installer Host cannot resolve LB DNS Name
- Closed
- is duplicated by
-
OCPBUGS-35898 [CAPI] Installation in us-gov-west-1 region failed because api lb DNS resolver timeout
- Closed
-
OCPBUGS-38893 OpenShift installation fails in the new AWS ap-southeast-5 region
- Closed
- is related to
-
OCPBUGS-35898 [CAPI] Installation in us-gov-west-1 region failed because api lb DNS resolver timeout
- Closed
-
CORS-3422 [capa][log] Load Balancer controller must wait for LB in active state to DNS lookup
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update