Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54567

[AWS][CAPI]machine stuck in Pending and capa log shows panic when set marketType: Spot

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          capi machine stuck in Pending and capa log shows panic when set marketType: Spot

      Version-Release number of selected component (if applicable):

          4.19.0-0.nightly-2025-04-02-170034

      How reproducible:

      always    

      Steps to Reproduce:

          1.create awsmachinetemple
      
      liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate.yaml 
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: AWSMachineTemplate
      metadata:
        name: aws-machinetemplate
        namespace: openshift-cluster-api
      spec:
        template:
          spec:
            additionalSecurityGroups:
            - filters:
              - name: tag:Name
                values:
                - huliu-aws43a-44ktn-node
            - filters:
              - name: tag:Name
                values:
                - huliu-aws43a-44ktn-lb
            ami:
              id: ami-0bd7465e9989694c9
            iamInstanceProfile: huliu-aws43a-44ktn-worker-profile
            ignition:
              storageType: UnencryptedUserData
              version: "3.2"
            instanceType: m6i.xlarge
            subnet:
              filters:
              - name: tag:Name
                values:
                - huliu-aws43a-44ktn-subnet-private-us-east-2c
            uncompressedUserData: true
            marketType: Spot
      liuhuali@Lius-MacBook-Pro huali-test % 
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
      NAME                  AGE
      aws-machinetemplate   7m16s
      
          2.create capi machineset
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machineset.c
      NAME               CLUSTER              REPLICAS   READY   AVAILABLE   AGE     VERSION
      capi-machineset1   huliu-aws43a-44ktn   1                              7m34s   
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
      NAME                     CLUSTER              NODENAME   PROVIDERID   PHASE     AGE     VERSION
      capi-machineset1-kcs49   huliu-aws43a-44ktn                           Pending   7m37s   
      liuhuali@Lius-MacBook-Pro huali-test %   
      
      liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-6687b8bf7f-dfpbb
      ...
      I0403 06:37:27.230382       1 awsmachine_controller.go:732] "Creating EC2 instance"
      E0403 06:37:27.483331       1 signal_unix.go:917] "Observed a panic" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=<
      	goroutine 365 [running]:
      	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x5b94738, 0xc001e7c630}, {0x4a74660, 0x82ea2c0})
      		/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc
      	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1()
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:105 +0x112
      	panic({0x4a74660?, 0x82ea2c0?})
      		/usr/lib/golang/src/runtime/panic.go:785 +0x132
      	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.getInstanceMarketOptionsRequest(0xc00013c680)
      		/build/pkg/cloud/services/ec2/instances.go:1197 +0x227
      	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).runInstance(0xc002042500, {0x5315543, 0x4}, 0xc00013c680)
      		/build/pkg/cloud/services/ec2/instances.go:646 +0x9d4
      	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).CreateInstance(0xc002042500, 0xc0025460c0, {0xc001431500, 0x6ce, 0x6ce}, {0xc001ef0fb0, 0x8})
      		/build/pkg/cloud/services/ec2/instances.go:260 +0x1329
      	sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).createInstance(0xc000cb9b00, {0x5bbc4e0, 0xc002042500}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5b94c68, 0xc0020dab10})
      		/build/controllers/awsmachine_controller.go:739 +0xae
      	sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0xc000cb9b00, {0x5bcbb10?, 0xc003282400?}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5bcbb10, 0xc003282400}, {0x5bcaf10, 0xc003282400}, ...)
      		/build/controllers/awsmachine_controller.go:533 +0x725
      	sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0xc000cb9b00, {0x5b94738, 0xc001e7c630}, {{{0xc001ab5f38?, 0xc001e7c630?}, {0xc001ab5f20?, 0x0?}}})
      		/build/controllers/awsmachine_controller.go:235 +0x97e
      	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc002042200?, {0x5b94738?, 0xc001e7c630?}, {{{0xc001ab5f38?, 0x0?}, {0xc001ab5f20?, 0x0?}}})
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0xbf
      	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x5bb2460, {0x5b94770, 0xc000e90d20}, {{{0xc001ab5f38, 0x15}, {0xc001ab5f20, 0x16}}})
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303 +0x3a5
      	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x5bb2460, {0x5b94770, 0xc000e90d20})
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x20e
      	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224 +0x85
      	created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 205
      		/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:220 +0x490
       >
      E0403 06:37:27.483389       1 controller.go:316] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63"
      liuhuali@Lius-MacBook-Pro huali-test %      

      Actual results:

          capi machine stuck in Pending and panic in logs

      Expected results:

          capi machine get Running and no panic in logs

      Additional info:

      new feature testing for https://issues.redhat.com/browse/OCPCLOUD-2781
      
      similar bug in MAPI https://issues.redhat.com/browse/OCPBUGS-52454

              athiruma@redhat.com Thirumalesh Aaraveti (Inactive)
              huliu@redhat.com Huali Liu
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: