-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15
This is a clone of issue OCPBUGS-33592. The following is the description of the original issue:
—
Description of problem:
While investigating a problem with OpenShift Container Platform 4 - Node scaling, I found the below messages reported in my OpenShift Container Platform 4 - Cluster. E0513 11:15:09.331353 1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c E0513 11:15:09.331365 1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.331529 1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0513 11:15:09.331684 1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c E0513 11:15:09.332076 1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented I0513 11:15:09.332100 1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332110 1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332135 1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}] The same events are reported in must-gather reviewed from customers. Given that we have https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 that appear to be solved via https://github.com/kubernetes/autoscaler/pull/6677 and https://github.com/kubernetes/autoscaler/pull/6038 I'm wondering whether we should pull in those changes as they seem to eventually impact automated scaling of OpenShift Container Platform 4 - Node(s).
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.15
How reproducible:
Always
Steps to Reproduce:
1. Setup OpenShift Container Platform 4 with ClusterAutoscaler configured 2. Trigger scaling activity and verify the cluster-autoscaler-default logs
Actual results:
Logs like the below are being reported. E0513 11:15:09.331353 1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c E0513 11:15:09.331365 1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.331529 1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0513 11:15:09.331684 1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c E0513 11:15:09.332076 1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented I0513 11:15:09.332100 1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332110 1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332135 1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]
Expected results:
Scale-up of OpenShift Container Platform 4 - Node to happen without error being reported I0513 11:15:09.331529 1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= I0513 11:15:09.331684 1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c I0513 11:15:09.332100 1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332110 1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c I0513 11:15:09.332135 1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]
Additional info:
Please review https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 as they seem to document the problem and also have a solution linked/merged
- blocks
-
OCPBUGS-33885 Automatic scaling not always working because NodeGroup.GetOptions() not being implemented
- Closed
- clones
-
OCPBUGS-33592 Automatic scaling not always working because NodeGroup.GetOptions() not being implemented
- Closed
- is blocked by
-
OCPBUGS-33592 Automatic scaling not always working because NodeGroup.GetOptions() not being implemented
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update