Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36222

AWS Installs Fail when Installer Host cannot resolve LB DNS Name

XMLWordPrintable

    • No
    • Installer Sprint 256
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The AWS Cluster API Provider (CAPA) runs a required check to resolve the DNS Name for load balancers it creates. If the CAPA controller (in this case, running in the installer) cannot resolve the DNS record, CAPA will not report infrastructure ready. We are seeing in some cases, that installations running on local hosts (we have not seen this problem in CI) will not be able to resolve the LB DNS name record and the install will fail like this:

          DEBUG I0625 17:05:45.939796    7645 awscluster_controller.go:295] "Waiting on API server ELB DNS name to resolve" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" namespace="openshift-cluster-api-guests" name="umohnani-4-16test-5ndjw" reconcileID="553beb3d-9b53-4d83-b417-9c70e00e277e" cluster="openshift-cluster-api-guests/umohnani-4-16test-5ndjw" 
      DEBUG Collecting applied cluster api manifests...  
      ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure was not ready within 15m0s: client rate limiter Wait returned an error: context deadline exceeded

      We do not know why some hosts cannot resolve these records, but it could be something like issues with the local DNS resolver cache, DNS records are slow to propagate in AWS, etc.

       

      Version-Release number of selected component (if applicable):

          4.16, 4.17

      How reproducible:

          Not reproducible / unknown -- this seems to be dependent on specific hosts and we have not determined why some hosts face this issue while others do not.

      Steps to Reproduce:

      n/a    

      Actual results:

      Install fails because CAPA cannot resolve LB DNS name 

      Expected results:

          As the DNS record does exist, install should be able to proceed.

      Additional info:

      Slack thread:

      https://redhat-internal.slack.com/archives/C68TNFWA2/p1719351032090749

            rdossant Rafael Fonseca dos Santos
            padillon Patrick Dillon
            Gaoyun Pei Gaoyun Pei
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: