-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
capi machine stuck in Pending and capa log shows panic when set marketType: Spot
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-04-02-170034
How reproducible:
always
Steps to Reproduce:
1.create awsmachinetemple liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate.yaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 kind: AWSMachineTemplate metadata: name: aws-machinetemplate namespace: openshift-cluster-api spec: template: spec: additionalSecurityGroups: - filters: - name: tag:Name values: - huliu-aws43a-44ktn-node - filters: - name: tag:Name values: - huliu-aws43a-44ktn-lb ami: id: ami-0bd7465e9989694c9 iamInstanceProfile: huliu-aws43a-44ktn-worker-profile ignition: storageType: UnencryptedUserData version: "3.2" instanceType: m6i.xlarge subnet: filters: - name: tag:Name values: - huliu-aws43a-44ktn-subnet-private-us-east-2c uncompressedUserData: true marketType: Spot liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate NAME AGE aws-machinetemplate 7m16s 2.create capi machineset liuhuali@Lius-MacBook-Pro huali-test % oc get machineset.c NAME CLUSTER REPLICAS READY AVAILABLE AGE VERSION capi-machineset1 huliu-aws43a-44ktn 1 7m34s liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION capi-machineset1-kcs49 huliu-aws43a-44ktn Pending 7m37s liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-6687b8bf7f-dfpbb ... I0403 06:37:27.230382 1 awsmachine_controller.go:732] "Creating EC2 instance" E0403 06:37:27.483331 1 signal_unix.go:917] "Observed a panic" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=< goroutine 365 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x5b94738, 0xc001e7c630}, {0x4a74660, 0x82ea2c0}) /build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1() /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:105 +0x112 panic({0x4a74660?, 0x82ea2c0?}) /usr/lib/golang/src/runtime/panic.go:785 +0x132 sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.getInstanceMarketOptionsRequest(0xc00013c680) /build/pkg/cloud/services/ec2/instances.go:1197 +0x227 sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).runInstance(0xc002042500, {0x5315543, 0x4}, 0xc00013c680) /build/pkg/cloud/services/ec2/instances.go:646 +0x9d4 sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).CreateInstance(0xc002042500, 0xc0025460c0, {0xc001431500, 0x6ce, 0x6ce}, {0xc001ef0fb0, 0x8}) /build/pkg/cloud/services/ec2/instances.go:260 +0x1329 sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).createInstance(0xc000cb9b00, {0x5bbc4e0, 0xc002042500}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5b94c68, 0xc0020dab10}) /build/controllers/awsmachine_controller.go:739 +0xae sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0xc000cb9b00, {0x5bcbb10?, 0xc003282400?}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5bcbb10, 0xc003282400}, {0x5bcaf10, 0xc003282400}, ...) /build/controllers/awsmachine_controller.go:533 +0x725 sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0xc000cb9b00, {0x5b94738, 0xc001e7c630}, {{{0xc001ab5f38?, 0xc001e7c630?}, {0xc001ab5f20?, 0x0?}}}) /build/controllers/awsmachine_controller.go:235 +0x97e sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc002042200?, {0x5b94738?, 0xc001e7c630?}, {{{0xc001ab5f38?, 0x0?}, {0xc001ab5f20?, 0x0?}}}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0xbf sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x5bb2460, {0x5b94770, 0xc000e90d20}, {{{0xc001ab5f38, 0x15}, {0xc001ab5f20, 0x16}}}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303 +0x3a5 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x5bb2460, {0x5b94770, 0xc000e90d20}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x20e sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2() /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 205 /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:220 +0x490 > E0403 06:37:27.483389 1 controller.go:316] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63" liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
capi machine stuck in Pending and panic in logs
Expected results:
capi machine get Running and no panic in logs
Additional info:
new feature testing for https://issues.redhat.com/browse/OCPCLOUD-2781 similar bug in MAPI https://issues.redhat.com/browse/OCPBUGS-52454
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update