-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.12
Description of problem:
The cluster-autoscaler-default is stuck in CrashLoopBackOff state with the below panic string reported. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64] goroutine 91 [running]: k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4 k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625 k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9 k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f}) /go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616 k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5 main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd main.main.func2({0x65722d7466696873?, 0x65642d657361656c?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b Investigating further, it appears that this is a known issue that was solved in https://github.com/kubernetes/kubernetes/pull/115179. Given that cluster-autoscaler-default stuck in CrashLoopBackOff state will impact Machine scaling capabilities its request to bring the fix from https://github.com/kubernetes/kubernetes/pull/115179 to OpenShift Container Platform 4.13 and 4.12 to prevent critical incidents from happening. https://github.com/kubernetes/kubernetes/pull/115348 the respective commit for OpenShift Container Platform 4.12 has been made available
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12.34
How reproducible:
Random
Steps to Reproduce:
1. It appears to be a race condition and not clear how to best trigger it
Actual results:
cluster-autoscaler-default pod stuck in CrashLoopBackOff state with panic shown below, failing to trigger Machine scaling. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64] goroutine 91 [running]: k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4 k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625 k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9 k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?) /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f}) /go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616 k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5 main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd main.main.func2({0x65722d7466696873?, 0x65642d657361656c?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
Expected results:
cluster-autoscaler-default pod not to panic and enter CrashLoopBackOff state, when the kubelet has not yet posted the csinode object for the respective OpenShift Container Platform 4 - Node.
Additional info:
- blocks
-
OCPBUGS-23274 nil pointer error in nodevolumelimits csi logging [4.12]
- Closed
- clones
-
OCPBUGS-23270 nil pointer error in nodevolumelimits csi logging [4.14]
- Closed
- is blocked by
-
OCPBUGS-23270 nil pointer error in nodevolumelimits csi logging [4.14]
- Closed
- is cloned by
-
OCPBUGS-23274 nil pointer error in nodevolumelimits csi logging [4.12]
- Closed
- links to
-
RHBA-2023:7475 OpenShift Container Platform 4.13.z bug fix update
(3 links to)