Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31018

CAPI machines stuck in Pending on AWS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Undefined
    • None
    • 4.16
    • None
    • Important
    • Yes
    • CLOUD Sprint 251
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      CAPI machines stuck in Pending on AWS

      Version-Release number of selected component (if applicable):

      4.16.0-0.nightly-2024-03-13-061822
      Before I tested on 4.16.0-0.nightly-2024-03-09-163353 it worked, refer https://issues.redhat.com/browse/OCPCLOUD-2441

      How reproducible:

      Always 

      Steps to Reproduce:

      1.Create an aws tech preview cluster, we use automated template: ipi-on-aws/versioned-installer-techpreview-ci 
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-03-13-061822   True        False         17m     Cluster version is 4.16.0-0.nightly-2024-03-13-061822     
      
      2.Create cluster, awscluster, awsmachinetemplate, capi MachineSet
      liuhuali@Lius-MacBook-Pro huali-test % oc get cluster
      NAME                  CLUSTERCLASS   PHASE         AGE   VERSION
      huliu-aws319a-fg8jx                  Provisioned   21m   
      liuhuali@Lius-MacBook-Pro huali-test % oc get awscluster
      NAME                  CLUSTER               READY   VPC   BASTION IP
      huliu-aws319a-fg8jx   huliu-aws319a-fg8jx   true          
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
      NAME                  AGE
      aws-machinetemplate   21m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machineset.cluster.x-k8s.io
      NAME                    CLUSTER               REPLICAS   READY   AVAILABLE   AGE   VERSION
      capi-machineset-51071   huliu-aws319a-fg8jx   1                              21m   
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
      NAME                          CLUSTER               NODENAME   PROVIDERID   PHASE     AGE   VERSION
      capi-machineset-51071-4dpbh   huliu-aws319a-fg8jx                           Pending   21m   
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io capi-machineset-51071-4dpbh -oyaml
      apiVersion: cluster.x-k8s.io/v1beta1
      kind: Machine
      metadata:
        creationTimestamp: "2024-03-19T01:47:14Z"
        finalizers:
        - machine.cluster.x-k8s.io
        generation: 1
        labels:
          cluster.x-k8s.io/cluster-name: huliu-aws319a-fg8jx
          cluster.x-k8s.io/set-name: capi-machineset-51071
          machine.openshift.io/cluster-api-cluster: huliu-aws319a-fg8jx
        name: capi-machineset-51071-4dpbh
        namespace: openshift-cluster-api
        ownerReferences:
        - apiVersion: cluster.x-k8s.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: MachineSet
          name: capi-machineset-51071
          uid: 0697b819-a881-4549-bfdb-5df69043301c
        resourceVersion: "47110"
        uid: 85ff9834-1458-42a7-b2ee-33bac2026810
      spec:
        bootstrap:
          dataSecretName: worker-user-data
        clusterName: huliu-aws319a-fg8jx
        infrastructureRef:
          apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
          kind: AWSMachine
          name: aws-machinetemplate-cq5gk
          namespace: openshift-cluster-api
          uid: 89e0acfa-0e36-445d-9104-91fc67450c25
        nodeDeletionTimeout: 10s
      status:
        conditions:
        - lastTransitionTime: "2024-03-19T01:47:34Z"
          message: 0 of 2 completed
          reason: InstanceProvisionFailed
          severity: Error
          status: "False"
          type: Ready
        - lastTransitionTime: "2024-03-19T01:47:34Z"
          message: 0 of 2 completed
          reason: InstanceProvisionFailed
          severity: Error
          status: "False"
          type: InfrastructureReady
        - lastTransitionTime: "2024-03-19T01:47:14Z"
          reason: WaitingForNodeRef
          severity: Info
          status: "False"
          type: NodeHealthy
        lastUpdated: "2024-03-19T01:47:14Z"
        observedGeneration: 1
        phase: Pending
      liuhuali@Lius-MacBook-Pro huali-test % 
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod
      NAME                                       READY   STATUS    RESTARTS      AGE
      capa-controller-manager-686c794b55-qv97p   1/1     Running   7 (51m ago)   73m
      capi-controller-manager-6bfff8b86d-7hqwk   1/1     Running   7 (51m ago)   73m
      cluster-capi-operator-6446bb5f97-9gqk8     1/1     Running   3 (58m ago)   75m
      liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-686c794b55-qv97p
      …
      I0319 02:04:52.654851       1 awscontrolleridentity_controller.go:87] "IdentityRef is nil, skipping reconciliation" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api/huliu-aws319a-fg8jx" namespace="openshift-cluster-api" name="huliu-aws319a-fg8jx" reconcileID="3bdb043c-0563-4a18-b148-0f14ac0c9570" cluster="openshift-cluster-api/huliu-aws319a-fg8jx"
      I0319 02:04:52.729568       1 awsmachine_controller.go:680] "Creating EC2 instance"
      E0319 02:04:52.729637       1 awsmachine_controller.go:520] "unable to create instance" err="failed to resolve userdata: creating userdata object: requested object creation but bucket management is not enabled"
      E0319 02:04:52.730014       1 controller.go:329] "Reconciler error" err="failed to resolve userdata: creating userdata object: requested object creation but bucket management is not enabled" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/aws-machinetemplate-cq5gk" namespace="openshift-cluster-api" name="aws-machinetemplate-cq5gk" reconcileID="860c860a-188f-4563-a86e-37d489552e65"
          

      Actual results:

          CAPI Machine stuck in Pending

      Expected results:

          CAPI Machine should get Running

      Additional info:

          must-gather: https://drive.google.com/file/d/1LS8z8an10rggCHuSx_kWTkP0mhYxSmsA/view?usp=sharing 

      Attachments

        Activity

          People

            ddonati@redhat.com Damiano Donati
            huliu@redhat.com Huali Liu
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: