Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42530

[AWS] CAPI machine in Local Zones or Wavelength stuck in Pending when publicIP: true

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          CAPI machine in Local Zones stuck in Pending, capa logs reports “parameter groupId is invalid. The value cannot be empty”

      Version-Release number of selected component (if applicable):

           4.17.0-0.nightly-2024-09-26-185948

      How reproducible:

          always

      Steps to Reproduce:

          1.nstall an AWS local zone or wavelength_zone cluster, we have automated template: 
      versioned-installer-customer_vpc-ovn-local_zone
      versioned-installer-customer_vpc-ovn-local_zone-ci
      versioned-installer-customer_vpc-ovn-local_zone_day2
      versioned-installer-customer_vpc-ovn-wavelength_zone
      versioned-installer-customer_vpc-ovn-wavelength_zone_day2
       with feature_set: "TechPreviewNoUpgrade"
      
      
      Here I use a prow job aws-ipi-localzone-byo-subnet-ovn-day2-f28-destructive then enable feature gate
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion 
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.17.0-0.nightly-2024-09-26-185948   True        False         130m    Cluster version is 4.17.0-0.nightly-2024-09-26-185948
      
          2.Create cluster
      
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f my-cluster926.yaml 
      cluster.cluster.x-k8s.io/ci-op-rwbcqck1-1e8db-4wgn9 created
           
      liuhuali@Lius-MacBook-Pro huali-test % cat my-cluster926.yaml 
      apiVersion: cluster.x-k8s.io/v1beta1
      kind: Cluster
      metadata:
        name: ci-op-rwbcqck1-1e8db-4wgn9
        namespace: openshift-cluster-api
      spec:
        infrastructureRef:
          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
          kind: AWSCluster
          name: ci-op-rwbcqck1-1e8db-4wgn9
          namespace: openshift-cluster-api 
      
          3.Edit AWSCluster to add subnets under network
      
        network:
          subnets:
          - id: subnet-00b4125d7daff7135
            isPublic: true     
      
          4.create awsmachinetemplate
      
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f awsmachinetemplate926.yaml
      awsmachinetemplate.infrastructure.cluster.x-k8s.io/aws-machinetemplate created
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
      NAME                  AGE
      aws-machinetemplate   8s
      
      
      liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate926.yaml
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: AWSMachineTemplate
      metadata:
        name: aws-machinetemplate
        namespace: openshift-cluster-api
      spec:
        template:
          spec:
            uncompressedUserData: true
            iamInstanceProfile: ci-op-rwbcqck1-1e8db-4wgn9-worker-profile
            instanceType: c5.2xlarge
            failureDomain: ap-northeast-1-tpe-1a
            ignition:
              storageType: UnencryptedUserData
              version: "3.2"
            ami:
              id: ami-0d7d4b329e5403cfb
            additionalSecurityGroups:
            - filters:
              - name: tag:Name
                values:
                - ci-op-rwbcqck1-1e8db-4wgn9-node
            - filters:
              - name: tag:Name
                values:
                - ci-op-rwbcqck1-1e8db-4wgn9-lb
            subnet:
              id: subnet-00b4125d7daff7135
            publicIP: true
      
          5.create capi machineset
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f capimachineset926.yaml  
      machineset.cluster.x-k8s.io/capi-machineset created
      
      
      liuhuali@Lius-MacBook-Pro huali-test % cat capimachineset926.yaml    
      apiVersion: cluster.x-k8s.io/v1beta1
      kind: MachineSet
      metadata:
        labels:
          cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
        name: capi-machineset
        namespace: openshift-cluster-api
      spec:
        clusterName: ci-op-rwbcqck1-1e8db-4wgn9
        deletePolicy: Newest
        replicas: 1
        selector:
          matchLabels:
            cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
            machine.openshift.io/cluster-api-cluster: ci-op-rwbcqck1-1e8db-4wgn9
        template:
          metadata:
            labels:
              cluster.x-k8s.io/cluster-name: ci-op-rwbcqck1-1e8db-4wgn9
              machine.openshift.io/cluster-api-cluster: ci-op-rwbcqck1-1e8db-4wgn9
          spec:
            bootstrap:
              dataSecretName: worker-user-data
            clusterName: ci-op-rwbcqck1-1e8db-4wgn9
            infrastructureRef:
              apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
              kind: AWSMachineTemplate
              name: aws-machinetemplate
      
       6.Found the machine stuck in pending
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c                                          
      NAME                    CLUSTER                      NODENAME   PROVIDERID   PHASE     AGE   VERSION
      capi-machineset-fwkl5   ci-op-rwbcqck1-1e8db-4wgn9                           Pending   15m   
      liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-757bc857fc-dk2lt  
      …
      I0927 07:51:40.189091       1 awsmachine_controller.go:710] "Creating EC2 instance"
      E0927 07:51:40.537149       1 awsmachine_controller.go:529] "unable to create instance" err=<
      	failed to create AWSMachine instance: failed to run instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
      		status code: 400, request id: b998d3c9-960e-497e-af7b-631b20e37c38
       >
      E0927 07:51:40.554513       1 controller.go:329] "Reconciler error" err=<
      	failed to create AWSMachine instance: failed to run instance: InvalidParameterValue: Value () for parameter groupId is invalid. The value cannot be empty
      		status code: 400, request id: b998d3c9-960e-497e-af7b-631b20e37c38
       > controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset-fwkl5" namespace="openshift-cluster-api" name="capi-machineset-fwkl5" reconcileID="3e65be90-0974-4102-9ec5-37716639c39b"
      
      
      Not sure what’s groupId meaning, didn’t find that in awscluster and awsmachinetemplate crd.

      Actual results:

      CAPI machine in Local Zones stuck in Pending     

      Expected results:

      CAPI machine in Local Zones should get Running

      Additional info:

          Must-gather https://drive.google.com/file/d/1jiNMfB1FfGdDjHoS05zQS7nFGQOpNBaP/view?usp=sharing 

            rh-ee-nbrubake Nolan Brubaker
            huliu@redhat.com Huali Liu
            Huali Liu Huali Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: