-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.12
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
Rejected
-
CLOUD Sprint 245
-
1
-
None
-
Bug Fix
-
-
None
-
None
-
None
-
None
Description of problem:
The cluster-autoscaler-default is stuck in CrashLoopBackOff state with the below panic string reported.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64]
goroutine 91 [running]:
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa
k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f})
/go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d
k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0})
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616
k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...)
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?})
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5
main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40})
/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd
main.main.func2({0x65722d7466696873?, 0x65642d657361656c?})
/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
Investigating further, it appears that this is a known issue that was solved in https://github.com/kubernetes/kubernetes/pull/115179.
Given that cluster-autoscaler-default stuck in CrashLoopBackOff state will impact Machine scaling capabilities its request to bring the fix from https://github.com/kubernetes/kubernetes/pull/115179 to OpenShift Container Platform 4.13 and 4.12 to prevent critical incidents from happening.
https://github.com/kubernetes/kubernetes/pull/115348 the respective commit for OpenShift Container Platform 4.12 has been made available
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12.34
How reproducible:
Random
Steps to Reproduce:
1. It appears to be a race condition and not clear how to best trigger it
Actual results:
cluster-autoscaler-default pod stuck in CrashLoopBackOff state with panic shown below, failing to trigger Machine scaling.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64]
goroutine 91 [running]:
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625
k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?)
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa
k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f})
/go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d
k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0})
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616
k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...)
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?})
/go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5
main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40})
/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd
main.main.func2({0x65722d7466696873?, 0x65642d657361656c?})
/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
Expected results:
cluster-autoscaler-default pod not to panic and enter CrashLoopBackOff state, when the kubelet has not yet posted the csinode object for the respective OpenShift Container Platform 4 - Node.
Additional info:
- clones
-
OCPBUGS-23272 nil pointer error in nodevolumelimits csi logging [4.13]
-
- Closed
-
- is blocked by
-
OCPBUGS-23272 nil pointer error in nodevolumelimits csi logging [4.13]
-
- Closed
-
- links to
-
RHBA-2023:7608
OpenShift Container Platform 4.12.z bug fix update
(1 links to)