-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
CAPI machine stuck in Pending on sts cluster
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-09-12-073027
How reproducible:
always
Steps to Reproduce:
1.Install an AWS sts cluster, we use automated template: ipi-on-aws/versioned-installer-sts-ci with feature_set: "TechPreviewNoUpgrade"
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.18.0-0.nightly-2024-09-12-073027 True False 30m Cluster version is 4.18.0-0.nightly-2024-09-12-073027
2.Create cluster
liuhuali@Lius-MacBook-Pro huali-test % oc create -f my-cluster.yaml
cluster.cluster.x-k8s.io/huliu-aws913a-vwnp9 created
liuhuali@Lius-MacBook-Pro huali-test % oc get cluster
NAME CLUSTERCLASS PHASE AGE VERSION
huliu-aws913a-vwnp9 Provisioned 4s
liuhuali@Lius-MacBook-Pro huali-test % oc get awscluster
NAME CLUSTER READY VPC BASTION IP
huliu-aws913a-vwnp9 huliu-aws913a-vwnp9 true
liuhuali@Lius-MacBook-Pro huali-test % cat my-cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: huliu-aws913a-vwnp9
namespace: openshift-cluster-api
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: huliu-aws913a-vwnp9
namespace: openshift-cluster-api
3.create awsmachinetemplate
liuhuali@Lius-MacBook-Pro huali-test % oc create -f awsmachinetemplate618.yaml
awsmachinetemplate.infrastructure.cluster.x-k8s.io/aws-machinetemplate created
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
NAME AGE
aws-machinetemplate 55s
liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate618.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
name: aws-machinetemplate
namespace: openshift-cluster-api
spec:
template:
spec:
uncompressedUserData: true
iamInstanceProfile: huliu-aws913a-vwnp9-worker-profile
instanceType: m6i.xlarge
failureDomain: us-east-2a
ignition:
storageType: UnencryptedUserData
version: "3.2"
ami:
id: ami-0bb13f743630d1cb5
additionalSecurityGroups:
- filters:
- name: tag:Name
values:
- huliu-aws913a-vwnp9-node
- filters:
- name: tag:Name
values:
- huliu-aws913a-vwnp9-lb
subnet:
filters:
- name: tag:Name
values:
- huliu-aws913a-vwnp9-subnet-private-us-east-2a
4.create capi machineset
liuhuali@Lius-MacBook-Pro huali-test % oc create -f capimachineset.yaml
machineset.cluster.x-k8s.io/capi-machineset-51071 created
liuhuali@Lius-MacBook-Pro huali-test % cat capimachineset.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineSet
metadata:
labels:
cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9
name: capi-machineset-51071
namespace: openshift-cluster-api
spec:
clusterName: huliu-aws913a-vwnp9
deletePolicy: Random
replicas: 1
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9
machine.openshift.io/cluster-api-cluster: huliu-aws913a-vwnp9
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: huliu-aws913a-vwnp9
machine.openshift.io/cluster-api-cluster: huliu-aws913a-vwnp9
spec:
bootstrap:
dataSecretName: worker-user-data
clusterName: huliu-aws913a-vwnp9
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: aws-machinetemplate
5. found the machine stuck in Pending
liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset-51071-j89k4 huliu-aws913a-vwnp9 Pending 30m
liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-7fc8c64c9f-gff2d
...
E0913 06:39:37.734443 1 controller.go:329] "Reconciler error" err="error getting infra provider cluster or control plane object: failed to create aws session: Failed to create a new AWS session: CredentialRequiresARNError: credential type web_identity_token_file requires role_arn, profile default" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset-51071-j89k4" namespace="openshift-cluster-api" name="capi-machineset-51071-j89k4" reconcileID="f0a59244-98fb-418f-bb78-0052c36b7feb"
Actual results:
CAPI machine stuck in Pending
Expected results:
CAPI machine should get Running
Additional info:
must gather: https://drive.google.com/file/d/13tIY_Aq9PkZQFSKKQ37p5hZvZ_aoIk3b/view?usp=sharing