Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1113

ImpactStatementRequested:: OCPBUGS-31421 Autoscaler should scale from zero when taints do not have a "value" field


    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • False
    • None
    • False
    • 0
    • 0

      We're asking the following questions to evaluate whether or not OCPBUGS-31421 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

      Sample answers are provided to give more context and the ImpactStatementRequested label has been added to OCPBUGS-31421. When responding, please move this ticket to Code Review. The expectation is that the assignee answers these questions.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      This bug can affect any user who is currently using, or upgrading into, any of the following versions:

      • 4.12.53
      • 4.13.38
      • 4.14.19
      • 4.15.6
        (the fix is expected in the next z-stream releases)

      Which types of clusters?

      This bug affects clusters that have the Machine API active and are able to exercise the cluster autoscaler feature for scaling from zero, this includes AWS, Azure, GCP, OpenStack, and vSphere.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      The impact of this bug is that the cluster autoscaler will panic and end execution under certain conditions.

      To make the cluster autoscaler fail, a MachineSet must be configured for a minimum scaling size of zero, it must be at zero replicas, and the MachineSet must have taints specified in the ".spec.template.spec.taints" field which do not contain the "value" field.

      How involved is remediation?

      A simple remediation that preserves autoscaling on other MachineSets is to remove the MachineAutoscaler for the affected MachineSet(s). Although this disables functionality, it will allow the autoscaler to continue functioning for other MachineSets.

      A more thorough remediation would be to add the "value" field to any taints that exist on the "MachineSet.spec.template.spec.taints" field which do not currently have the entry. The value for this field is arbitrary, it's presence will allow the autoscaler to properly process the record.

      An alternative remediation involves removing the taints from the MachineSet records. Depending on the reasoning for the taints, this process should involve alternate methods for applying the taints or for specifying the proper affinity rules for pods which need to target this MachineSet's nodes.

      Is this a regression?

      No, this is behavior that had been broken and was fixed in a recent update. The fix contained a bug which would only be exercised in the specific case where the optional "value" field is not provided.

            mimccune@redhat.com Michael McCune
            pratikam Pratik Mahajan
            0 Vote for this issue
            4 Start watching this issue