Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29391

AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

    • Low
    • No
    • Hypershift Sprint 250, Hypershift Sprint 251, Hypershift Sprint 252
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, an AWS policy issue prevented the Cluster API Provider for AWS from retrieving the necessary domain information. As a consequence, installing an AWS hosted cluster with a custom domain failed. With this update, the policy issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-29391[*OCPBUGS-29391*])
      Show
      * Previously, an AWS policy issue prevented the Cluster API Provider for AWS from retrieving the necessary domain information. As a consequence, installing an AWS hosted cluster with a custom domain failed. With this update, the policy issue is resolved. (link: https://issues.redhat.com/browse/OCPBUGS-29391 [* OCPBUGS-29391 *])
    • Bug Fix
    • Done

      Description of problem:

      AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

      Version-Release number of selected component (if applicable):

      Any

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create a VPC for a HyperShift/ROSA HCP cluster in AWS
      2. Replace the VPC's DHCP Option Set with another with a custom domain name (example.com or really any domain of your choice)
      3. Attempt to install a HyperShift/ROSA HCP cluster with a nodepool

      Actual results:

      All EC2 instances will fail to become nodes. They will generate CSR's based on the default domain name - ec2.internal for us-east-1 or ${region}.compute.internal for other regions (e.g. us-east-2.compute.internal)

      Expected results:

      Either that they become nodes or that we document that custom domain names in DHCP Option Sets are not allowed with HyperShift at this time. There is currently no pressing need for this feature, though customers do use this in ROSA Classic/OCP successfully.

      Additional info:

      This is a known gap currently in cluster-api-provider-aws (CAPA) https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1691

            [OCPBUGS-29391] AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Hey rh-ee-adejong the details:

            For now we don't have 100% sure in which version of 4.14 will get in, but all points to 4.14.30 (because 4.14.29 does not contain the patch and was built yesterday) (https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.14.29)

            Juan Manuel Parrilla Madrid added a comment - Hey rh-ee-adejong the details: Jira issue for the backport: https://issues.redhat.com/browse/OCPBUGS-34856 Downstream PR: https://github.com/openshift/cluster-api-provider-aws/pull/516 (merged) Fix included in accepted release 4.14.0-0.nightly-2024-06-07-015529 For now we don't have 100% sure in which version of 4.14 will get in, but all points to 4.14.30 (because 4.14.29 does not contain the patch and was built yesterday) ( https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.14.29 )

            Hey rh-ee-bchandra there is not any reason to not backport it, I've just followed the Jira details. I will proceed with the backport to 4.14.

            Juan Manuel Parrilla Madrid added a comment - Hey rh-ee-bchandra there is not any reason to not backport it, I've just followed the Jira details. I will proceed with the backport to 4.14.

            Hi jparrill@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi jparrill@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Hey jaharrin this is the bug I mentioned in the slack thread at #sd-rosa-hcp channel.

            Juan Manuel Parrilla Madrid added a comment - Hey jaharrin this is the bug I mentioned in the slack thread at #sd-rosa-hcp channel.

            Hey mshen.openshift I've been reviewing with cewong@redhat.com a couple of approaches to solve the issue but looks like they were not successful because, when you adds a domain-name to the DHCP Options Set it only modifies the resolv.conf not the node hostname in AWS or the machines PrivateDNSName in the machine. 

            In the Cesar's tests, there are not CSR to be approved, but the nodes also does not show up. Could you help us understand a little bit more the issue?

            Juan Manuel Parrilla Madrid added a comment - Hey mshen.openshift I've been reviewing with cewong@redhat.com a couple of approaches to solve the issue but looks like they were not successful because, when you adds a domain-name to the DHCP Options Set it only modifies the resolv.conf not the node hostname in AWS or the machines PrivateDNSName in the machine.  In the Cesar's tests, there are not CSR to be approved, but the nodes also does not show up. Could you help us understand a little bit more the issue?

            The PR was sent to be reviewed by the community https://kubernetes.slack.com/archives/CD6U2V71N/p1709631881787179 

            Juan Manuel Parrilla Madrid added a comment - The PR was sent to be reviewed by the community https://kubernetes.slack.com/archives/CD6U2V71N/p1709631881787179  

            Wenqi He added a comment -

            Bounced this bug to Critical since we have more and more cases/tickets received from CX

            Wenqi He added a comment - Bounced this bug to Critical since we have more and more cases/tickets received from CX

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              mshen.openshift Michael Shen (Inactive)
              Jie Zhao Jie Zhao
              Laura Hinson Laura Hinson
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: