Description of problem:
When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined.
Version-Release number of selected component (if applicable):
4.16/master
How reproducible:
always
Steps to Reproduce:
1. create a machineset with a taint that has no value field and 0 replicas 2. enable the cluster autoscaler 3. force a workload to scale the tainted machineset
Actual results:
a panic like this is observed I0325 15:36:38.314276 1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0) panic: interface conversion: interface {} is nil, not string goroutine 79 [running]: k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa main.run(0x0?, {0x2761b48, 0xc0004c04e0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd main.main.func2({0x0?, 0x0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105
Expected results:
expect the machineset to scale up
Additional info:
i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 .
- blocks
-
OCPBUGS-31464 Autoscaler should scale from zero when taints do not have a "value" field
- Closed
- is blocked by
-
MCO-1113 ImpactStatementRequested:: OCPBUGS-31421 Autoscaler should scale from zero when taints do not have a "value" field
- Closed
- is cloned by
-
OCPBUGS-31464 Autoscaler should scale from zero when taints do not have a "value" field
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update