Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.16.0, 4.17.0
Component/s: HyperShift
Labels:
- hypershift-qe

Severity:
Critical
Regression:
No
Sprint:
Hypershift Sprint 252, Hypershift Sprint 253, Hypershift Sprint 254, Hypershift Sprint 255
sprint_count:
4
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Deploying a self-managed private hosted cluster on AWS fails because the `bootstrap-kubeconfig` file uses an incorrect KAS port. As a result, the AWS instances are provisioned but cannot join the hosted cluster as nodes. (link:https://issues.redhat.com/browse/OCPBUGS-31840[*~~OCPBUGS-31840~~*])

Show
* Deploying a self-managed private hosted cluster on AWS fails because the `bootstrap-kubeconfig` file uses an incorrect KAS port. As a result, the AWS instances are provisioned but cannot join the hosted cluster as nodes. (link: https://issues.redhat.com/browse/OCPBUGS-31840 [* OCPBUGS-31840 *])
Release Note Type:
Known Issue
Release Note Status:
Done
Target Version:

4.17.0
Target Backport Versions:

4.14.z, 4.15.z, 4.16.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Private HC provision failed on AWS.

How reproducible:

Always.

Steps to Reproduce:

Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/:

RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211
HO_IMAGE=quay.io/hypershift/hypershift-operator:latest
BUCKET_NAME=fxie-hcp-bucket
REGION=us-east-2
AWS_CREDS="$HOME/.aws/credentials"
CLUSTER_NAME=fxie-hcp-1
BASE_DOMAIN=qe.devcluster.openshift.com
EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com
PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson"

hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE

hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh

Additional info:

From the MC:
$ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
admin-kubeconfig
    server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
aws-pod-identity-webhook-kubeconfig
    server: https://kube-apiserver:6443
bootstrap-kubeconfig
    server: https://api.fxie-hcp-1.hypershift.local:443
cloud-credential-operator-kubeconfig
    server: https://kube-apiserver:6443
dns-operator-kubeconfig
    server: https://kube-apiserver:6443
fxie-hcp-1-2bsct-kubeconfig
    server: https://kube-apiserver:6443
ingress-operator-kubeconfig
    server: https://kube-apiserver:6443
kube-controller-manager-kubeconfig
    server: https://kube-apiserver:6443
kube-scheduler-kubeconfig
    server: https://kube-apiserver:6443
localhost-kubeconfig
    server: https://localhost:6443
service-network-admin-kubeconfig
    server: https://kube-apiserver:6443

The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

From a bastion:
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.143.91:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

Besides, the CNO also passes the wrong KAS port to Network components on the HC.

Same for HA proxy configuration on the VMs:

frontend local_apiserver
  bind 172.20.0.1:6443
  log global
  mode tcp
  option tcplog
  default_backend remote_apiserver

backend remote_apiserver
  mode tcp
  log global
  option httpchk GET /version
  option log-health-checks
  default-server inter 10s fall 3 rise 3
  server controlplane api.fxie-hcp-1.hypershift.local:443

is cloned by

OCPBUGS-36220 Failed to provision private HC on AWS

Closed

is depended on by

OCPBUGS-36220 Failed to provision private HC on AWS

Closed

is related to

OCPBUGS-42181 HO from main fails to create private cluster with KAS type LB on 4.15 and earlier

Closed

links to

openshift/hypershift#3849: OCPBUGS-31840: Make guest cluster components use the correct KAS port

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Feilian Xie (Inactive)

Reporter:: Feilian Xie (Inactive)

QA Contact:: Feilian Xie (Inactive)

Doc Contact:: Laura Hinson

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024/04/06 5:05 PM

Updated:: 2024/10/01 5:38 PM

Resolved:: 2024/10/01 5:38 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates