-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
capi machine stuck in Pending and capa log shows panic when set marketType: Spot
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-04-02-170034
How reproducible:
always
Steps to Reproduce:
1.create awsmachinetemple
liuhuali@Lius-MacBook-Pro huali-test % cat awsmachinetemplate.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
name: aws-machinetemplate
namespace: openshift-cluster-api
spec:
template:
spec:
additionalSecurityGroups:
- filters:
- name: tag:Name
values:
- huliu-aws43a-44ktn-node
- filters:
- name: tag:Name
values:
- huliu-aws43a-44ktn-lb
ami:
id: ami-0bd7465e9989694c9
iamInstanceProfile: huliu-aws43a-44ktn-worker-profile
ignition:
storageType: UnencryptedUserData
version: "3.2"
instanceType: m6i.xlarge
subnet:
filters:
- name: tag:Name
values:
- huliu-aws43a-44ktn-subnet-private-us-east-2c
uncompressedUserData: true
marketType: Spot
liuhuali@Lius-MacBook-Pro huali-test %
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate
NAME AGE
aws-machinetemplate 7m16s
2.create capi machineset
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset.c
NAME CLUSTER REPLICAS READY AVAILABLE AGE VERSION
capi-machineset1 huliu-aws43a-44ktn 1 7m34s
liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset1-kcs49 huliu-aws43a-44ktn Pending 7m37s
liuhuali@Lius-MacBook-Pro huali-test %
liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-6687b8bf7f-dfpbb
...
I0403 06:37:27.230382 1 awsmachine_controller.go:732] "Creating EC2 instance"
E0403 06:37:27.483331 1 signal_unix.go:917] "Observed a panic" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=<
goroutine 365 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x5b94738, 0xc001e7c630}, {0x4a74660, 0x82ea2c0})
/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1()
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:105 +0x112
panic({0x4a74660?, 0x82ea2c0?})
/usr/lib/golang/src/runtime/panic.go:785 +0x132
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.getInstanceMarketOptionsRequest(0xc00013c680)
/build/pkg/cloud/services/ec2/instances.go:1197 +0x227
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).runInstance(0xc002042500, {0x5315543, 0x4}, 0xc00013c680)
/build/pkg/cloud/services/ec2/instances.go:646 +0x9d4
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).CreateInstance(0xc002042500, 0xc0025460c0, {0xc001431500, 0x6ce, 0x6ce}, {0xc001ef0fb0, 0x8})
/build/pkg/cloud/services/ec2/instances.go:260 +0x1329
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).createInstance(0xc000cb9b00, {0x5bbc4e0, 0xc002042500}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5b94c68, 0xc0020dab10})
/build/controllers/awsmachine_controller.go:739 +0xae
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0xc000cb9b00, {0x5bcbb10?, 0xc003282400?}, 0xc0025460c0, {0x5bbd5b0, 0xc003282400}, {0x5bcbb10, 0xc003282400}, {0x5bcaf10, 0xc003282400}, ...)
/build/controllers/awsmachine_controller.go:533 +0x725
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0xc000cb9b00, {0x5b94738, 0xc001e7c630}, {{{0xc001ab5f38?, 0xc001e7c630?}, {0xc001ab5f20?, 0x0?}}})
/build/controllers/awsmachine_controller.go:235 +0x97e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc002042200?, {0x5b94738?, 0xc001e7c630?}, {{{0xc001ab5f38?, 0x0?}, {0xc001ab5f20?, 0x0?}}})
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0xbf
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x5bb2460, {0x5b94770, 0xc000e90d20}, {{{0xc001ab5f38, 0x15}, {0xc001ab5f20, 0x16}}})
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303 +0x3a5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x5bb2460, {0x5b94770, 0xc000e90d20})
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x20e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 205
/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:220 +0x490
>
E0403 06:37:27.483389 1 controller.go:316] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/capi-machineset1-kcs49" namespace="openshift-cluster-api" name="capi-machineset1-kcs49" reconcileID="01f59328-7c51-4585-8d1d-0a2de4498f63"
liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
capi machine stuck in Pending and panic in logs
Expected results:
capi machine get Running and no panic in logs
Additional info:
new feature testing for https://issues.redhat.com/browse/OCPCLOUD-2781 similar bug in MAPI https://issues.redhat.com/browse/OCPBUGS-52454
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update