Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33956

[AWS] CAPI machine stuck in Pending and capa log shows panic when set publicIP: true

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.16
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          [AWS] CAPI machine stuck in Pending and capa log shows panic when set publicIP: true

      Version-Release number of selected component (if applicable):

          4.16.0-0.nightly-2024-05-07-025557

      How reproducible:

          always

      Steps to Reproduce:

          1.Install a AWS 4.16 tech preview cluster, we use automated template: ipi-on-aws/versioned-installer-ovn-ci with feature_set: "TechPreviewNoUpgrade"
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-05-07-025557   True        False         20m     Cluster version is 4.16.0-0.nightly-2024-05-07-025557     
      
          2.Create CAPI machine with  publicIP: true
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine.cluster.x-k8s.io  
      NAME                          CLUSTER               NODENAME   PROVIDERID   PHASE     AGE   VERSION
      capi-machineset-51071-cwdb5   huliu-aws520b-bp6bq                           Pending   10m   
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate        
      NAME                  AGE
      aws-machinetemplate   11m
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate aws-machinetemplate -oyaml
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: AWSMachineTemplate
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2","kind":"AWSMachineTemplate","metadata":{"annotations":{},"name":"aws-machinetemplate","namespace":"openshift-cluster-api"},"spec":{"template":{"spec":{"additionalSecurityGroups":[{"filters":[{"name":"tag:Name","values":["huliu-aws520b-bp6bq-worker-sg"]}]}],"ami":{"id":"ami-0ae9b509738034a2c"},"failureDomain":"us-east-2c","iamInstanceProfile":"huliu-aws520b-bp6bq-worker-profile","ignition":{"storageType":"UnencryptedUserData","version":"3.2"},"instanceType":"m6i.xlarge","publicIP":true,"subnet":{"filters":[{"name":"tag:Name","values":["huliu-aws520b-bp6bq-private-us-east-2c"]}]},"uncompressedUserData":true}}}}
        creationTimestamp: "2024-05-20T09:53:44Z"
        generation: 1
        name: aws-machinetemplate
        namespace: openshift-cluster-api
        ownerReferences:
        - apiVersion: cluster.x-k8s.io/v1beta1
          kind: Cluster
          name: huliu-aws520b-bp6bq
          uid: 7a718e08-b245-4cfb-85f8-5db0ef99e2fe
        resourceVersion: "246389"
        uid: c0d90468-392b-449d-b9fb-0c58eac73d7a
      spec:
        template:
          spec:
            additionalSecurityGroups:
            - filters:
              - name: tag:Name
                values:
                - huliu-aws520b-bp6bq-worker-sg
            ami:
              id: ami-0ae9b509738034a2c
            iamInstanceProfile: huliu-aws520b-bp6bq-worker-profile
            ignition:
              storageType: UnencryptedUserData
              version: "3.2"
            instanceType: m6i.xlarge
            publicIP: true
            subnet:
              filters:
              - name: tag:Name
                values:
                - huliu-aws520b-bp6bq-private-us-east-2c
            uncompressedUserData: true
      liuhuali@Lius-MacBook-Pro huali-test % 
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod 
      NAME                                       READY   STATUS    RESTARTS        AGE
      capa-controller-manager-6d695d96c9-tnppx   1/1     Running   1 (7h21m ago)   7h26m
      capi-controller-manager-68456c4d64-c7dsq   1/1     Running   1 (7h21m ago)   7h26m
      cluster-capi-operator-5d6b7d67b4-2lvl6     1/1     Running   1 (7h24m ago)   7h27m
      liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-6d695d96c9-tnppx 
      …
      I0520 10:01:47.104698       1 awsmachine_controller.go:680] "Creating EC2 instance"
      E0520 10:01:47.204987       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
      goroutine 439 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2f1c1c0?, 0x5a601b0})
      	/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:108 +0xb2
      panic({0x2f1c1c0?, 0x5a601b0?})
      	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
      sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).findSubnet(0xc0031f2930, 0xc0044142a0)
      	/build/pkg/cloud/services/ec2/instances.go:353 +0x1e2d
      sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).CreateInstance(0xc0031f2930, 0xc0044142a0, {0xc002a1dc00, 0x6cf, 0x6cf}, {0xc003f4b2e0, 0x8})
      	/build/pkg/cloud/services/ec2/instances.go:179 +0xa25
      sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).createInstance(0xc0044142a0?, {0x3cc1d08, 0xc0031f2930}, 0xc0044142a0, {0x3cc4c28, 0xc002e02d80}, {0x3ca3aa0, 0xc0004eec60})
      	/build/controllers/awsmachine_controller.go:687 +0xaf
      sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0xc001376cf0, {0x3cd1148?, 0xc002e02d80?}, 0xc0044142a0, {0x3cc4c28, 0xc002e02d80}, {0x3cd1148, 0xc002e02d80}, {0x3cd0b48, 0xc002e02d80}, ...)
      	/build/controllers/awsmachine_controller.go:518 +0x345
      sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0xc001376cf0, {0x3ca3778, 0xc003080b10}, {{{0xc003d17620?, 0x0?}, {0xc002ad99e0?, 0xc001122d08?}}})
      	/build/controllers/awsmachine_controller.go:236 +0xac9
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3ca7d60?, {0x3ca3778?, 0xc003080b10?}, {{{0xc003d17620?, 0xb?}, {0xc002ad99e0?, 0x0?}}})
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006841e0, {0x3ca37b0, 0xc001458690}, {0x30cf3e0?, 0xc003d6f3a0?})
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006841e0, {0x3ca37b0, 0xc001458690})
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1af
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
      created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 262
      	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565
      E0520 10:01:47.205036       1 controller.go:329] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/aws-machinetemplate-ptr2z" namespace="openshift-cluster-api" name="aws-machinetemplate-ptr2z" reconcileID="332c196f-96e6-4225-bafb-9f132a285ef1"
      liuhuali@Lius-MacBook-Pro huali-test % 
          

      Actual results:

          Machine stuck in Pending

      Expected results:

          Machine get Running

      Additional info:

          We have such case for MAPI https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/machines.go#L636 I tested it for CAPI today, and found it failed.
      
      Must gather: https://drive.google.com/file/d/1sun15h4Sem7I936qJMHc0oY1AjrAluH_/view?usp=sharing 

              ddonati@redhat.com Damiano Donati
              huliu@redhat.com Huali Liu
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: