Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23096

nil pointer error in nodevolumelimits csi logging

XMLWordPrintable

    • Important
    • No
    • CLOUD Sprint 245
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the cluster autoscaler crashed when used with nodes that have Container Storage Interface (CSI) storage. The issue is resolved in this release. (link:https://issues.redhat.com/browse/OCPBUGS-23096[*OCPBUGS-23096*])
      Show
      * Previously, the cluster autoscaler crashed when used with nodes that have Container Storage Interface (CSI) storage. The issue is resolved in this release. (link: https://issues.redhat.com/browse/OCPBUGS-23096 [* OCPBUGS-23096 *])
    • Bug Fix
    • Done

      Description of problem:

      The cluster-autoscaler-default is stuck in CrashLoopBackOff state with the below panic string reported.
      
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64]
      
      goroutine 91 [running]:
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9
      k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd
      k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa
      k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d
      k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616
      k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676
      k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5
      main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd
      main.main.func2({0x65722d7466696873?, 0x65642d657361656c?})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25
      created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
      
      Investigating further, it appears that this is a known issue that was solved in https://github.com/kubernetes/kubernetes/pull/115179.
      
      Given that cluster-autoscaler-default stuck in CrashLoopBackOff state will impact Machine scaling capabilities its request to bring the fix from https://github.com/kubernetes/kubernetes/pull/115179 to OpenShift Container Platform 4.13 and 4.12 to prevent critical incidents from happening.
      
      https://github.com/kubernetes/kubernetes/pull/115348 the respective commit for OpenShift Container Platform 4.12 has been made available
      
      

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.12.34
      

      How reproducible:

      Random
      

      Steps to Reproduce:

      1. It appears to be a race condition and not clear how to best trigger it
      

      Actual results:

      cluster-autoscaler-default pod stuck in CrashLoopBackOff state with panic shown below, failing to trigger Machine scaling.
      
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64]
      
      goroutine 91 [running]:
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625
      k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9
      k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd
      k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa
      k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d
      k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616
      k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...)
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676
      k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5
      main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd
      main.main.func2({0x65722d7466696873?, 0x65642d657361656c?})
          /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25
      created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
          /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
      

      Expected results:

      cluster-autoscaler-default pod not to panic and enter CrashLoopBackOff state, when the kubelet has not yet posted the csinode object for the respective OpenShift Container Platform 4 - Node.
      

      Additional info:

      
      

              mimccune@redhat.com Michael McCune
              rhn-support-sreber Simon Reber
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: