-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
machine stuck in provisioning and machine-controller log shows panic when set marketType: Spot
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-03-05-160850
How reproducible:
Always
Steps to Reproduce:
1.Create an AWS cluster
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.19.0-0.nightly-2025-03-05-160850 True False 28m Cluster version is 4.19.0-0.nightly-2025-03-05-160850
2.Copy a default machineset and set marketType: Spot, then create it
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml
machineset.machine.openshift.io/huliu-aws36a-6bslb-worker-us-east-2aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws36a-6bslb-worker-us-east-2aa -oyaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
machine.openshift.io/GPU: "0"
machine.openshift.io/memoryMb: "16384"
machine.openshift.io/vCPU: "4"
creationTimestamp: "2025-03-06T03:46:29Z"
generation: 1
labels:
machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb
name: huliu-aws36a-6bslb-worker-us-east-2aa
namespace: openshift-machine-api
resourceVersion: "55489"
uid: fe33ee1d-384f-413e-b2a6-046c9d94dfc3
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb
machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa
spec:
lifecycleHooks: {}
metadata: {}
providerSpec:
value:
ami:
id: ami-0e763ecd8ccccbc99
apiVersion: machine.openshift.io/v1beta1
blockDevices:
- ebs:
encrypted: true
iops: 0
kmsKey:
arn: ""
volumeSize: 120
volumeType: gp3
capacityReservationId: ""
credentialsSecret:
name: aws-cloud-credentials
deviceIndex: 0
iamInstanceProfile:
id: huliu-aws36a-6bslb-worker-profile
instanceType: m6i.xlarge
kind: AWSMachineProviderConfig
marketType: Spot
metadata:
creationTimestamp: null
metadataServiceOptions: {}
placement:
availabilityZone: us-east-2a
region: us-east-2
securityGroups:
- filters:
- name: tag:Name
values:
- huliu-aws36a-6bslb-node
- filters:
- name: tag:Name
values:
- huliu-aws36a-6bslb-lb
subnet:
filters:
- name: tag:Name
values:
- huliu-aws36a-6bslb-subnet-private-us-east-2a
tags:
- name: kubernetes.io/cluster/huliu-aws36a-6bslb
value: owned
userDataSecret:
name: worker-user-data
status:
fullyLabeledReplicas: 1
observedGeneration: 1
replicas: 1
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-aws36a-6bslb-master-0 Running m6i.xlarge us-east-2 us-east-2a 123m
huliu-aws36a-6bslb-master-1 Running m6i.xlarge us-east-2 us-east-2b 123m
huliu-aws36a-6bslb-master-2 Running m6i.xlarge us-east-2 us-east-2c 123m
huliu-aws36a-6bslb-worker-us-east-2a-p7jdn Running m6i.xlarge us-east-2 us-east-2a 119m
huliu-aws36a-6bslb-worker-us-east-2aa-kxvfx Provisioning 5m10s
huliu-aws36a-6bslb-worker-us-east-2b-wktd2 Running m6i.xlarge us-east-2 us-east-2b 119m
huliu-aws36a-6bslb-worker-us-east-2c-5b5zs Running m6i.xlarge us-east-2 us-east-2c 119m
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-6b567f49c8-kpkfg -c machine-controller
...
E0306 03:46:30.434664 1 signal_unix.go:917] "msg"="Observed a panic" "error"=null "controller"="machine-controller" "name"="huliu-aws36a-6bslb-worker-us-east-2aa-kxvfx" "namespace"="openshift-machine-api" "object"={"name":"huliu-aws36a-6bslb-worker-us-east-2aa-kxvfx","namespace":"openshift-machine-api"} "panic"="runtime error: invalid memory address or nil pointer dereference" "panicGoValue"="\"invalid memory address or nil pointer dereference\"" "reconcileID"="ef6541ce-d66a-457c-93fb-6c8a32070c04" "stacktrace"="goroutine 181 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x4357d00, 0xc00235f2f0}, {0x3910ae0, 0x5be1da0})\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1()\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:107 +0x112\npanic({0x3910ae0?, 0x5be1da0?})\n\t/usr/lib/golang/src/runtime/panic.go:785 +0x132\ngithub.com/openshift/machine-api-provider-aws/pkg/actuators/machine.getInstanceMarketOptionsRequest(0xc0000af348)\n\t/go/src/github.com/openshift/machine-api-provider-aws/pkg/actuators/machine/instances.go:610 +0x1db\ngithub.com/openshift/machine-api-provider-aws/pkg/actuators/machine.launchInstance(0xc000c9a288, 0xc0000af348, {0xc000c88000, 0x6ce, 0x6ce}, {0x437e800, 0xc001494ff0}, {0x436bae0, 0xc00061b290}, 0xc0007f3d48)\n\t/go/src/github.com/openshift/machine-api-provider-aws/pkg/actuators/machine/instances.go:450 +0xe0a\ngithub.com/openshift/machine-api-provider-aws/pkg/actuators/machine.(*Reconciler).create(0xc000efb558)\n\t/go/src/github.com/openshift/machine-api-provider-aws/pkg/actuators/machine/reconciler.go:99 +0x81e\ngithub.com/openshift/machine-api-provider-aws/pkg/actuators/machine.(*Actuator).Create(0xc0008d7a90, {0x4357d00, 0xc00235f2f0}, 0xc000c9a288)\n\t/go/src/github.com/openshift/machine-api-provider-aws/pkg/actuators/machine/actuator.go:94 +0x2b6\ngithub.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc0007697a0, {0x4357d00, 0xc00235f2f0}, {{{0xc001d47488?, 0x34f7f7a?}, {0xc000d27a40?, 0x0?}}})\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:408 +0x1459\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc0000e8e80?, {0x4357d00?, 0xc00235f2f0?}, {{{0xc001d47488?, 0x0?}, {0xc000d27a40?, 0x0?}}})\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0xbf\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x437e740, {0x4357d38, 0xc00076f810}, {{{0xc001d47488, 0x15}, {0xc000d27a40, 0x2b}}})\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:328 +0x3a5\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x437e740, {0x4357d38, 0xc00076f810})\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:288 +0x20e\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:249 +0x85\ncreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 112\n\t/go/src/github.com/openshift/machine-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:245 +0x6b8\n"
E0306 03:46:30.434717 1 controller.go:341] "msg"="Reconciler error" "error"="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" "controller"="machine-controller" "name"="huliu-aws36a-6bslb-worker-us-east-2aa-kxvfx" "namespace"="openshift-machine-api" "object"={"name":"huliu-aws36a-6bslb-worker-us-east-2aa-kxvfx","namespace":"openshift-machine-api"} "reconcileID"="ef6541ce-d66a-457c-93fb-6c8a32070c04"
3.
Actual results:
machine stuck in Provisioning and panic in machine-controller log
Expected results:
machine get Running
Additional info:
New feature testing for https://issues.redhat.com/browse/OCPCLOUD-2780
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update