Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.16.z
Affects Version/s: 4.16.0
Component/s: HyperShift
Labels:
- hypershift-qe

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.16.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:

Hide
* Deploying a self-managed private hosted cluster on AWS fails because the `bootstrap-kubeconfig` file uses an incorrect KAS port. As a result, the AWS instances are provisioned but cannot join the hosted cluster as nodes. (link:https://issues.redhat.com/browse/OCPBUGS-36220[*~~OCPBUGS-36220~~*])

Show
* Deploying a self-managed private hosted cluster on AWS fails because the `bootstrap-kubeconfig` file uses an incorrect KAS port. As a result, the AWS instances are provisioned but cannot join the hosted cluster as nodes. (link: https://issues.redhat.com/browse/OCPBUGS-36220 [* OCPBUGS-36220 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Private HC provision failed on AWS.

How reproducible:

Always.

Steps to Reproduce:

Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/:

RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211
HO_IMAGE=quay.io/hypershift/hypershift-operator:latest
BUCKET_NAME=fxie-hcp-bucket
REGION=us-east-2
AWS_CREDS="$HOME/.aws/credentials"
CLUSTER_NAME=fxie-hcp-1
BASE_DOMAIN=qe.devcluster.openshift.com
EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com
PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson"

hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE

hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh

Additional info:

From the MC:
$ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
admin-kubeconfig
    server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
aws-pod-identity-webhook-kubeconfig
    server: https://kube-apiserver:6443
bootstrap-kubeconfig
    server: https://api.fxie-hcp-1.hypershift.local:443
cloud-credential-operator-kubeconfig
    server: https://kube-apiserver:6443
dns-operator-kubeconfig
    server: https://kube-apiserver:6443
fxie-hcp-1-2bsct-kubeconfig
    server: https://kube-apiserver:6443
ingress-operator-kubeconfig
    server: https://kube-apiserver:6443
kube-controller-manager-kubeconfig
    server: https://kube-apiserver:6443
kube-scheduler-kubeconfig
    server: https://kube-apiserver:6443
localhost-kubeconfig
    server: https://localhost:6443
service-network-admin-kubeconfig
    server: https://kube-apiserver:6443

The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

From a bastion:
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.143.91:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

Besides, the CNO also passes the wrong KAS port to Network components on the HC.

Same for HA proxy configuration on the VMs:

frontend local_apiserver
  bind 172.20.0.1:6443
  log global
  mode tcp
  option tcplog
  default_backend remote_apiserver

backend remote_apiserver
  mode tcp
  log global
  option httpchk GET /version
  option log-health-checks
  default-server inter 10s fall 3 rise 3
  server controlplane api.fxie-hcp-1.hypershift.local:443

clones

OCPBUGS-31840 Failed to provision private HC on AWS

Closed

depends on

OCPBUGS-31840 Failed to provision private HC on AWS

Closed

is cloned by

OCPBUGS-42214 Failed to provision private HC on AWS

Closed

is depended on by

OCPBUGS-42214 Failed to provision private HC on AWS

Closed

links to

openshift/hypershift#4275: [release-4.16] OCPBUGS-36220: Make guest cluster components use the correct KAS port

RHBA-2024:4316 OpenShift Container Platform 4.16.z bug fix update

(1 links to)

Assignee:: Feilian Xie (Inactive)

Reporter:: Feilian Xie (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Feilian Xie (Inactive)

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/06/26 5:08 PM

Updated:: 2025/07/22 11:33 AM

Resolved:: 2024/07/09 10:51 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide