Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33932

Automatic scaling not always working because NodeGroup.GetOptions() not being implemented

XMLWordPrintable

    • Important
    • No
    • CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255
    • 3
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, an optional internal function of the cluster autoscaler caused repeated log entries when it was not implemented.
      The issue is resolved in this release.
      (link:https://issues.redhat.com/browse/OCPBUGS-33932[*OCPBUGS-33932*])
      Show
      * Previously, an optional internal function of the cluster autoscaler caused repeated log entries when it was not implemented. The issue is resolved in this release. (link: https://issues.redhat.com/browse/OCPBUGS-33932 [* OCPBUGS-33932 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-33592. The following is the description of the original issue:

      Description of problem:

      While investigating a problem with OpenShift Container Platform 4 - Node scaling, I found the below messages reported in my OpenShift Container Platform 4 - Cluster.
      
      E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
      E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
      I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
      E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
      I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]
      
      The same events are reported in must-gather reviewed from customers. Given that we have https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 that appear to be solved via https://github.com/kubernetes/autoscaler/pull/6677 and https://github.com/kubernetes/autoscaler/pull/6038 I'm wondering whether we should pull in those changes as they seem to eventually impact automated scaling of OpenShift Container Platform 4 - Node(s).
      

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.15
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Setup OpenShift Container Platform 4 with ClusterAutoscaler configured
      2. Trigger scaling activity and verify the cluster-autoscaler-default logs
      

      Actual results:

      Logs like the below are being reported.
      
      E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
      E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
      I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
      E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
      I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]
      

      Expected results:

      Scale-up of OpenShift Container Platform 4 - Node to happen without error being reported
      
      I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
      I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
      I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
      I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]
      

      Additional info:

      Please review https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 as they seem to document the problem and also have a solution linked/merged
      

            mimccune@redhat.com Michael McCune
            openshift-crt-jira-prow OpenShift Prow Bot
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: