-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z
-
Moderate
-
No
-
False
-
Description of problem:
The cluster autoscaler logs claimed that: ❯ k logs -n openshift-machine-api cluster-autoscaler-default-79bbdb47c8-f2mhw --tail 10 -f I0110 15:22:34.974683 1 klogx.go:87] Pod ci-op-7y3wpmbd/dpdk-amd64-build is unschedulable ... I0110 15:22:34.974714 1 klogx.go:87] Pod ci-op-vh6l7s82/tests-private-amd64-build is unschedulable I0110 15:22:34.974795 1 klogx.go:87] 338 other pods are also unschedulable I0110 15:22:38.028472 1 orchestrator.go:168] No expansion options I0110 15:22:38.949588 1 eligibility.go:102] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s I0110 15:22:38.949691 1 legacy.go:193] 1 nodes found to be unremovable in simulation, will re-check them at 2024-01-10 15:27:22.940617125 +0000 UTC m=+1529958.772155039 I0110 15:22:39.944889 1 legacy.go:296] No candidates for scale down
However, after performing a pod restart, the cluster-autoscaler noticed that there were in fact expansion options and scaled up.
Version-Release number of selected component (if applicable):
4.14.7
How reproducible:
Unsure
Steps to Reproduce:
1. 2. 3.
Actual results:
Cluster autoscaler status and machinesets at the time it believed there were no expansion options available: ❯ k get cm -n openshift-machine-api cluster-autoscaler-status -oyaml apiVersion: v1 data: status: |+ Cluster-autoscaler status at 2024-01-10 15:53:01.081693035 +0000 UTC: Cluster-wide: Health: Healthy (ready=31 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=31 longUnregistered=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2023-12-23 22:28:17.53635074 +0000 UTC m=+13.367888721 ScaleUp: NoActivity (ready=31 registered=31) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 14:34:06.135121145 +0000 UTC m=+1526761.966659046 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 15:09:42.49780205 +0000 UTC m=+1528898.329339964 NodeGroups: Name: MachineSet/openshift-machine-api/build05-kwk66-ci-builds-worker-us-east-1a Health: Healthy (ready=0 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=120)) LastProbeTime: 0001-01-01 00:00:00 +0000 UTC LastTransitionTime: 2023-12-23 22:28:17.53635074 +0000 UTC m=+13.367888721 ScaleUp: NoActivity (ready=0 cloudProviderTarget=0) LastProbeTime: 0001-01-01 00:00:00 +0000 UTC LastTransitionTime: 2024-01-02 05:13:04.845376598 +0000 UTC m=+801900.676914511 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-02 05:42:03.12588419 +0000 UTC m=+803638.957422102 Name: MachineSet/openshift-machine-api/build05-kwk66-ci-longtests-worker-us-east-1a Health: Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=0, maxSize=120)) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2023-12-27 06:25:40.633849606 +0000 UTC m=+287856.465387596 ScaleUp: NoActivity (ready=1 cloudProviderTarget=1) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 10:20:10.643142528 +0000 UTC m=+1511526.474680441 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 15:09:42.49780205 +0000 UTC m=+1528898.329339964 Name: MachineSet/openshift-machine-api/build05-kwk66-ci-prowjobs-worker-us-east-1a Health: Healthy (ready=4 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=4 longUnregistered=0 cloudProviderTarget=4 (minSize=0, maxSize=120)) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2023-12-23 22:28:17.53635074 +0000 UTC m=+13.367888721 ScaleUp: NoActivity (ready=4 cloudProviderTarget=4) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 09:54:01.373951307 +0000 UTC m=+1509957.205489207 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 09:54:27.396549022 +0000 UTC m=+1509983.228086936 Name: MachineSet/openshift-machine-api/build05-kwk66-ci-tests-worker-us-east-1a Health: Healthy (ready=12 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=12 longUnregistered=0 cloudProviderTarget=12 (minSize=0, maxSize=120)) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2023-12-23 22:28:17.53635074 +0000 UTC m=+13.367888721 ScaleUp: NoActivity (ready=12 cloudProviderTarget=12) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 11:39:04.279432634 +0000 UTC m=+1516260.110970547 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 15:07:00.394322553 +0000 UTC m=+1528736.225860467 Name: MachineSet/openshift-machine-api/build05-kwk66-worker-us-east-1a Health: Healthy (ready=3 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=3 (minSize=2, maxSize=50)) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2023-12-23 22:28:17.53635074 +0000 UTC m=+13.367888721 ScaleUp: NoActivity (ready=3 cloudProviderTarget=3) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 14:34:06.135121145 +0000 UTC m=+1526761.966659046 ScaleDown: NoCandidates (candidates=0) LastProbeTime: 2024-01-10 15:52:43.077610751 +0000 UTC m=+1531478.909148680 LastTransitionTime: 2024-01-10 14:46:09.552677149 +0000 UTC m=+1527485.384215063 kind: ConfigMap ❯ k get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api build05-kwk66-ci-builds-worker-us-east-1a 0 0 624d openshift-machine-api build05-kwk66-ci-longtests-worker-us-east-1a 1 1 1 1 605d openshift-machine-api build05-kwk66-ci-prowjobs-worker-us-east-1a 4 4 4 4 605d openshift-machine-api build05-kwk66-ci-tests-worker-us-east-1a 12 12 12 12 624d openshift-machine-api build05-kwk66-infra-us-east-1a 2 2 2 2 112d openshift-machine-api build05-kwk66-worker-us-east-1a 3 3 3 3 629d
Expected results:
After restarting the cluster-autoscaler pod, it performed as expected: I0110 16:08:56.001512 1 orchestrator.go:189] Best option to resize: MachineSet/openshift-machine-api/build05-kwk66-ci-builds-worker-us-east-1a I0110 16:08:56.001531 1 orchestrator.go:193] Estimated 138 nodes needed in MachineSet/openshift-machine-api/build05-kwk66-ci-builds-worker-us-east-1a I0110 16:08:56.844729 1 orchestrator.go:302] Final scale-up plan: [{MachineSet/openshift-machine-api/build05-kwk66-ci-builds-worker-us-east-1a 0->120 (max: 120)}] I0110 16:08:56.844772 1 orchestrator.go:584] Scale-up: setting group MachineSet/openshift-machine-api/build05-kwk66-ci-builds-worker-us-east-1a size to 120 ❯ k get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api build05-kwk66-ci-builds-worker-us-east-1a 120 120 624d openshift-machine-api build05-kwk66-ci-longtests-worker-us-east-1a 1 1 1 1 605d openshift-machine-api build05-kwk66-ci-prowjobs-worker-us-east-1a 4 4 4 4 605d openshift-machine-api build05-kwk66-ci-tests-worker-us-east-1a 12 12 12 12 624d openshift-machine-api build05-kwk66-infra-us-east-1a 2 2 2 2 112d openshift-machine-api build05-kwk66-worker-us-east-1a 3 3 3 3 629d
Additional info: