-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18
-
None
-
Moderate
-
None
-
False
-
Description of problem:
CAPI machine stuck in Pending on sts cluster
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-09-12-073027
How reproducible:
always
Steps to Reproduce:
1.Install an AWS sts cluster, we use automated template: ipi-on-aws/versioned-installer-sts-ci with feature_set: "TechPreviewNoUpgrade" liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.18.0-0.nightly-2024-09-12-073027 True False 30m Cluster version is 4.18.0-0.nightly-2024-09-12-073027 2.Create cluster liuhuali@Lius-MacBook-Pro huali-test % oc create -f my-cluster.yaml cluster.cluster.x-k8s.io/huliu-aws913a-vwnp9 created liuhuali@Lius-MacBook-Pro huali-test % oc get cluster NAME CLUSTERCLASS PHASE AGE VERSION huliu-aws913a-vwnp9 Provisioned 4s liuhuali@Lius-MacBook-Pro huali-test % oc get awscluster NAME CLUSTER READY VPC BASTION IP huliu-aws913a-vwnp9 huliu-aws913a-vwnp9 true liuhuali@Lius-MacBook-Pro huali-test % cat my-cluster.yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: huliu-aws913a-vwnp9 namespace: openshift-cluster-api spec: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AWSCluster name: huliu-aws913a-vwnp9 namespace: openshift-cluster-api 3.create awsmachinetemplate liuhuali@Lius-MacBook-Pro huali-test % oc create -f awsmachinetemplate618.yaml awsmachinetemplate.infrastructure.cluster.x-k8s.io/aws-machinetemplate created liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate NAME AGE aws-machinetemplate 55s liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate618.yaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 kind: AWSMachineTemplate metadata: name: aws-machinetemplate namespace: openshift-cluster-api spec: template: spec: uncompressedUserData: true iamInstanceProfile: huliu-aws913a-vwnp9-worker-profile instanceType: m6i.xlarge failureDomain: us-east-2a ignition: storageType: UnencryptedUserData version: "3.2" ami: id: ami-0bb13f743630d1cb5 additionalSecurityGroups: - filters: - name: tag:Name values: - huliu-aws913a-vwnp9-node - filters: - name: tag:Name values: - huliu-aws913a-vwnp9-lb subnet: filters: - name: tag:Name values: - huliu-aws913a-vwnp9-subnet-private-us-east-2a 4.create capi machineset liuhuali@Lius-MacBook-Pro huali-test % oc create -f capimachineset.yaml machineset.cluster.x-k8s.io/capi-machineset-51071 created liuhuali@Lius-MacBook-Pro huali-test % cat capimachineset.yaml apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineSet metadata: labels: cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9 name: capi-machineset-51071 namespace: openshift-cluster-api spec: clusterName: huliu-aws913a-vwnp9 deletePolicy: Random replicas: 1 selector: matchLabels: cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9 machine.openshift.io/cluster-api-cluster: huliu-aws913a-vwnp9 template: metadata: labels: cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9 machine.openshift.io/cluster-api-cluster: huliu-aws913a-vwnp9 spec: bootstrap: dataSecretName: worker-user-data clusterName: huliu-aws913a-vwnp9 infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 kind: AWSMachineTemplate name: aws-machinetemplate 5. found the machine stuck in Pending liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION capi-machineset-51071-j89k4 huliu-aws913a-vwnp9 Pending 30m liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-7fc8c64c9f-gff2d ... E0913 06:39:37.734443 1 controller.go:329] "Reconciler error" err="error getting infra provider cluster or control plane object: failed to create aws session: Failed to create a new AWS session: CredentialRequiresARNError: credential type web_identity_token_file requires role_arn, profile default" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset-51071-j89k4" namespace="openshift-cluster-api" name="capi-machineset-51071-j89k4" reconcileID="f0a59244-98fb-418f-bb78-0052c36b7feb"
Actual results:
CAPI machine stuck in Pending
Expected results:
CAPI machine should get Running
Additional info:
must gather: https://drive.google.com/file/d/13tIY_Aq9PkZQFSKKQ37p5hZvZ_aoIk3b/view?usp=sharing