Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31421

Autoscaler should scale from zero when taints do not have a "value" field

XMLWordPrintable

    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when a user created a compute machine set with taints, they could choose to not specify the `Value` field.
      Failure to specify this field caused the cluster autoscaler to crash.
      With this release, the cluster autoscaler is updated to handle an empty `Value` field.
      (link:https://issues.redhat.com/browse/OCPBUGS-31421[*OCPBUGS-31421*])
      Show
      * Previously, when a user created a compute machine set with taints, they could choose to not specify the `Value` field. Failure to specify this field caused the cluster autoscaler to crash. With this release, the cluster autoscaler is updated to handle an empty `Value` field. (link: https://issues.redhat.com/browse/OCPBUGS-31421 [* OCPBUGS-31421 *])
    • Bug Fix
    • Done

      Description of problem:

      When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined.
      
          

      Version-Release number of selected component (if applicable):

      4.16/master
          

      How reproducible:

      always
          

      Steps to Reproduce:

          1. create a machineset with a taint that has no value field and 0 replicas
          2. enable the cluster autoscaler
          3. force a workload to scale the tainted machineset
          

      Actual results:

      a panic like this is observed
      
      I0325 15:36:38.314276       1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0)
      panic: interface conversion: interface {} is nil, not string
      
      goroutine 79 [running]:
      k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...)
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246
      k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?})
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5
      k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930)
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea
      k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?})
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d
      k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...)
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599
      k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?})
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa
      main.run(0x0?, {0x2761b48, 0xc0004c04e0})
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd
      main.main.func2({0x0?, 0x0?})
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25
      created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
      	/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105
          

      Expected results:

      expect the machineset to scale up
          

      Additional info:
      i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 .

      this search shows some failed results

              mimccune@redhat.com Michael McCune
              mimccune@redhat.com Michael McCune
              Zhaohua Sun Zhaohua Sun
              Jeana Routh Jeana Routh
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: