-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.16.0
-
No
-
Rejected
-
False
-
-
-
Release Note Not Required
-
In Progress
Description of problem:
Private HC provision failed on AWS.
How reproducible:
Always.
Steps to Reproduce:
Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/: RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211 HO_IMAGE=quay.io/hypershift/hypershift-operator:latest BUCKET_NAME=fxie-hcp-bucket REGION=us-east-2 AWS_CREDS="$HOME/.aws/credentials" CLUSTER_NAME=fxie-hcp-1 BASE_DOMAIN=qe.devcluster.openshift.com EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson" hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh
Additional info:
From the MC: $ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done admin-kubeconfig server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443 aws-pod-identity-webhook-kubeconfig server: https://kube-apiserver:6443 bootstrap-kubeconfig server: https://api.fxie-hcp-1.hypershift.local:443 cloud-credential-operator-kubeconfig server: https://kube-apiserver:6443 dns-operator-kubeconfig server: https://kube-apiserver:6443 fxie-hcp-1-2bsct-kubeconfig server: https://kube-apiserver:6443 ingress-operator-kubeconfig server: https://kube-apiserver:6443 kube-controller-manager-kubeconfig server: https://kube-apiserver:6443 kube-scheduler-kubeconfig server: https://kube-apiserver:6443 localhost-kubeconfig server: https://localhost:6443 service-network-admin-kubeconfig server: https://kube-apiserver:6443
The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.
From a bastion: [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection timed out. [ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.143.91:6443. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Besides, the CNO also passes the wrong KAS port to Network components on the HC.
Same for HA proxy configuration on the VMs:
frontend local_apiserver
bind 172.20.0.1:6443
log global
mode tcp
option tcplog
default_backend remote_apiserver
backend remote_apiserver
mode tcp
log global
option httpchk GET /version
option log-health-checks
default-server inter 10s fall 3 rise 3
server controlplane api.fxie-hcp-1.hypershift.local:443
- clones
-
OCPBUGS-31840 Failed to provision private HC on AWS
- Closed
- depends on
-
OCPBUGS-31840 Failed to provision private HC on AWS
- Closed
- is cloned by
-
OCPBUGS-42214 Failed to provision private HC on AWS
- Closed
- is depended on by
-
OCPBUGS-42214 Failed to provision private HC on AWS
- Closed
- links to
-
RHBA-2024:4316 OpenShift Container Platform 4.16.z bug fix update