-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
CAPI machine in Local Zones stuck in Pending, capa logs reports “parameter groupId is invalid. The value cannot be empty”
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-09-26-185948
How reproducible:
always
Steps to Reproduce:
1.nstall an AWS local zone or wavelength_zone cluster, we have automated template:
versioned-installer-customer_vpc-ovn-local_zone
versioned-installer-customer_vpc-ovn-local_zone-ci
versioned-installer-customer_vpc-ovn-local_zone_day2
versioned-installer-customer_vpc-ovn-wavelength_zone
versioned-installer-customer_vpc-ovn-wavelength_zone_day2
with feature_set: "TechPreviewNoUpgrade"
Here I use a prow job aws-ipi-localzone-byo-subnet-ovn-day2-f28-destructive then enable feature gate
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.nightly-2024-09-26-185948 True False 130m Cluster version is 4.17.0-0.nightly-2024-09-26-185948
2.Create cluster
liuhuali@Lius-MacBook-Pro huali-test % oc create -f my-cluster926.yaml
cluster.cluster.x-k8s.io/ci-op-rwbcqck1-1e8db-4wgn9 created
liuhuali@Lius-MacBook-Pro huali-test % cat my-cluster926.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: ci-op-rwbcqck1-1e8db-4wgn9
namespace: openshift-cluster-api
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: ci-op-rwbcqck1-1e8db-4wgn9
namespace: openshift-cluster-api
3.Edit AWSCluster to add subnets under network
network:
subnets:
- id: subnet-00b4125d7daff7135
isPublic: true
4.create awsmachinetemplate
liuhuali@Lius-MacBook-Pro huali-test % oc create -f awsmachinetemplate926.yaml
awsmachinetemplate.infrastructure.cluster.x-k8s.io/aws-machinetemplate created
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
NAME AGE
aws-machinetemplate 8s
liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate926.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
name: aws-machinetemplate
namespace: openshift-cluster-api
spec:
template:
spec:
uncompressedUserData: true
iamInstanceProfile: ci-op-rwbcqck1-1e8db-4wgn9-worker-profile
instanceType: c5.2xlarge
failureDomain: ap-northeast-1-tpe-1a
ignition:
storageType: UnencryptedUserData
version: "3.2"
ami:
id: ami-0d7d4b329e5403cfb
additionalSecurityGroups:
- filters:
- name: tag:Name
values:
- ci-op-rwbcqck1-1e8db-4wgn9-node
- filters:
- name: tag:Name
values:
- ci-op-rwbcqck1-1e8db-4wgn9-lb
subnet:
id: subnet-00b4125d7daff7135
publicIP: true
5.create capi machineset
liuhuali@Lius-MacBook-Pro huali-test % oc create -f capimachineset926.yaml
machineset.cluster.x-k8s.io/capi-machineset created
liuhuali@Lius-MacBook-Pro huali-test % cat capimachineset926.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineSet
metadata:
labels:
cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
name: capi-machineset
namespace: openshift-cluster-api
spec:
clusterName: ci-op-rwbcqck1-1e8db-4wgn9
deletePolicy: Newest
replicas: 1
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
machine.openshift.io/cluster-api-cluster: ci-op-rwbcqck1-1e8db-4wgn9
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
machine.openshift.io/cluster-api-cluster: ci-op-rwbcqck1-1e8db-4wgn9
spec:
bootstrap:
dataSecretName: worker-user-data
clusterName: ci-op-rwbcqck1-1e8db-4wgn9
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: aws-machinetemplate
6.Found the machine stuck in pending
liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset-fwkl5 ci-op-rwbcqck1-1e8db-4wgn9 Pending 15m
liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-757bc857fc-dk2lt
…
I0927 07:51:40.189091 1 awsmachine_controller.go:710] "Creating EC2 instance"
E0927 07:51:40.537149 1 awsmachine_controller.go:529] "unable to create instance" err=<
failed to create AWSMachine instance: failed to run instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
status code: 400, request id: b998d3c9-960e-497e-af7b-631b20e37c38
>
E0927 07:51:40.554513 1 controller.go:329] "Reconciler error" err=<
failed to create AWSMachine instance: failed to run instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
status code: 400, request id: b998d3c9-960e-497e-af7b-631b20e37c38
> controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset-fwkl5" namespace="openshift-cluster-api" name="capi-machineset-fwkl5" reconcileID="3e65be90-0974-4102-9ec5-37716639c39b"
Not sure what’s groupId meaning, didn’t find that in awscluster and awsmachinetemplate crd.
Actual results:
CAPI machine in Local Zones stuck in Pending
Expected results:
CAPI machine in Local Zones should get Running
Additional info:
Must-gather https://drive.google.com/file/d/1jiNMfB1FfGdDjHoS05zQS7nFGQOpNBaP/view?usp=sharing