Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31840

Failed to provision private HC on AWS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.16.0
    • HyperShift
    • Critical
    • No
    • Hypershift Sprint 252, Hypershift Sprint 253
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Private HC provision failed on AWS. 

      How reproducible:

      Always. 

      Steps to Reproduce:

      Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/. 

      Additional info:

      From the MC:
      $ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
      admin-kubeconfig
          server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
      aws-pod-identity-webhook-kubeconfig
          server: https://kube-apiserver:6443
      bootstrap-kubeconfig
          server: https://api.fxie-hcp-1.hypershift.local:443
      cloud-credential-operator-kubeconfig
          server: https://kube-apiserver:6443
      dns-operator-kubeconfig
          server: https://kube-apiserver:6443
      fxie-hcp-1-2bsct-kubeconfig
          server: https://kube-apiserver:6443
      ingress-operator-kubeconfig
          server: https://kube-apiserver:6443
      kube-controller-manager-kubeconfig
          server: https://kube-apiserver:6443
      kube-scheduler-kubeconfig
          server: https://kube-apiserver:6443
      localhost-kubeconfig
          server: https://localhost:6443
      service-network-admin-kubeconfig
          server: https://kube-apiserver:6443
      

       

      The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

      From a bastion:
      [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
      Ncat: Version 7.50 ( https://nmap.org/ncat )
      Ncat: Connection timed out.
      [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
      Ncat: Version 7.50 ( https://nmap.org/ncat )
      Ncat: Connected to 10.0.143.91:6443.
      Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
      

       

      Besides, the CNO also passes the wrong KAS port to Network components on the HC.

       

      Same for HA proxy configuration on the VMs:

      frontend local_apiserver
        bind 172.20.0.1:6443
        log global
        mode tcp
        option tcplog
        default_backend remote_apiserver
      
      backend remote_apiserver
        mode tcp
        log global
        option httpchk GET /version
        option log-health-checks
        default-server inter 10s fall 3 rise 3
        server controlplane api.fxie-hcp-1.hypershift.local:443 

            fxierh Feilian Xie
            fxierh Feilian Xie
            Feilian Xie Feilian Xie
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: