Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.16.z
Affects Version/s: 4.16.0
Component/s: HyperShift
Labels:
- hypershift-qe
- no-qe

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.15.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, the {ai-full} did not reload new data from the Assisted Service when the {ai-full} checked control plane nodes for readiness and a conflict existed with a write operation from the {ai-full} controller. This conflict prevented the {ai-full} from detecting a node that was marked by the {ai-full} controller as `Ready` because the {ai-full} relied on older information. With this release, the {ai-full} can receive the newest information from the Assisted Service, so that it the {ai-full} can accurately detect the status of each node. (link:https://issues.redhat.com/browse/OCPBUGS-38003[*~~OCPBUGS-38003~~*])

Show
* Previously, the {ai-full} did not reload new data from the Assisted Service when the {ai-full} checked control plane nodes for readiness and a conflict existed with a write operation from the {ai-full} controller. This conflict prevented the {ai-full} from detecting a node that was marked by the {ai-full} controller as `Ready` because the {ai-full} relied on older information. With this release, the {ai-full} can receive the newest information from the Assisted Service, so that it the {ai-full} can accurately detect the status of each node. (link: https://issues.redhat.com/browse/OCPBUGS-38003 [* OCPBUGS-38003 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Private HC provision failed on AWS.

How reproducible:

Always.

Steps to Reproduce:

Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/:

RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211
HO_IMAGE=quay.io/hypershift/hypershift-operator:latest
BUCKET_NAME=fxie-hcp-bucket
REGION=us-east-2
AWS_CREDS="$HOME/.aws/credentials"
CLUSTER_NAME=fxie-hcp-1
BASE_DOMAIN=qe.devcluster.openshift.com
EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com
PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson"

hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE

hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh

Additional info:

From the MC:
$ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
admin-kubeconfig
    server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
aws-pod-identity-webhook-kubeconfig
    server: https://kube-apiserver:6443
bootstrap-kubeconfig
    server: https://api.fxie-hcp-1.hypershift.local:443
cloud-credential-operator-kubeconfig
    server: https://kube-apiserver:6443
dns-operator-kubeconfig
    server: https://kube-apiserver:6443
fxie-hcp-1-2bsct-kubeconfig
    server: https://kube-apiserver:6443
ingress-operator-kubeconfig
    server: https://kube-apiserver:6443
kube-controller-manager-kubeconfig
    server: https://kube-apiserver:6443
kube-scheduler-kubeconfig
    server: https://kube-apiserver:6443
localhost-kubeconfig
    server: https://localhost:6443
service-network-admin-kubeconfig
    server: https://kube-apiserver:6443

The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

From a bastion:
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.143.91:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

Besides, the CNO also passes the wrong KAS port to Network components on the HC.

Same for HA proxy configuration on the VMs:

frontend local_apiserver
  bind 172.20.0.1:6443
  log global
  mode tcp
  option tcplog
  default_backend remote_apiserver

backend remote_apiserver
  mode tcp
  log global
  option httpchk GET /version
  option log-health-checks
  default-server inter 10s fall 3 rise 3
  server controlplane api.fxie-hcp-1.hypershift.local:443

blocks

HOSTEDCP-1569 Use HO/e2e from main in CI tests for all releases

Closed

OCPBUGS-42221 Failed to provision private HC on AWS

Closed

OCPBUGS-42181 HO from main fails to create private cluster with KAS type LB on 4.15 and earlier

Closed

clones

OCPBUGS-36220 Failed to provision private HC on AWS

Closed

depends on

OCPBUGS-36220 Failed to provision private HC on AWS

Closed

is cloned by

OCPBUGS-42221 Failed to provision private HC on AWS

Closed

is duplicated by

OCPBUGS-42181 HO from main fails to create private cluster with KAS type LB on 4.15 and earlier

Closed

links to

openshift/hypershift#4749: [release-4.15] OCPBUGS-42214: Make guest cluster components use the correct KAS port

RHSA-2024:7179 OpenShift Container Platform 4.15.z security update

(1 is cloned by, 1 is duplicated by, 2 links to)

Assignee:: Seth Jennings

Reporter:: Feilian Xie (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Feilian Xie (Inactive)

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/09/19 2:07 PM

Updated:: 2025/07/20 1:33 PM

Resolved:: 2024/10/02 5:49 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates