-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.16.0
Description of problem:
Private HC provision failed on AWS.
How reproducible:
Always.
Steps to Reproduce:
Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/.
Additional info:
From the MC: $ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done admin-kubeconfig server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443 aws-pod-identity-webhook-kubeconfig server: https://kube-apiserver:6443 bootstrap-kubeconfig server: https://api.fxie-hcp-1.hypershift.local:443 cloud-credential-operator-kubeconfig server: https://kube-apiserver:6443 dns-operator-kubeconfig server: https://kube-apiserver:6443 fxie-hcp-1-2bsct-kubeconfig server: https://kube-apiserver:6443 ingress-operator-kubeconfig server: https://kube-apiserver:6443 kube-controller-manager-kubeconfig server: https://kube-apiserver:6443 kube-scheduler-kubeconfig server: https://kube-apiserver:6443 localhost-kubeconfig server: https://localhost:6443 service-network-admin-kubeconfig server: https://kube-apiserver:6443
The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.
From a bastion: [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection timed out. [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.143.91:6443. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Besides, the CNO also passes the wrong KAS port to Network components on the HC.
Same for HA proxy configuration on the VMs:
frontend local_apiserver
bind 172.20.0.1:6443
log global
mode tcp
option tcplog
default_backend remote_apiserver
backend remote_apiserver
mode tcp
log global
option httpchk GET /version
option log-health-checks
default-server inter 10s fall 3 rise 3
server controlplane api.fxie-hcp-1.hypershift.local:443