-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.17
-
None
-
None
-
CLOUD Sprint 257, CLOUD Sprint 258, CLOUD Sprint 259
-
3
-
False
-
TODO: (Placeholder for now)
Description of problem:
The CAO can get into a failed state:
2023-03-22T12:46:49.148733289Z E0322 12:46:49.148726 1 static_autoscaler.go:364] Failed to fix node group sizes: failed to decrease MachineSet/openshift-machine-api/eu-3-compute-kgzn2-aro-machineset-compute-xl-germanywestcentral-1: attempt to delete existing nodes targetSize:4 delta:-1 existingNodes: 6
Version-Release number of selected component (if applicable):
4.16
How reproducible:
Yes
Steps to Reproduce:
oc scale deployment cluster-version-operator -n openshift-cluster-version --replicas=0 oc scale deployment machine-api-operator --replicas=0 oc scale deployment machine-api-controllers --replicas=0 kubectl config view --raw -o json | jq '.clusters[0].cluster."certificate-authority-data"' -r | base64 --decode > ca.crt kubectl config view --raw -o json | jq '.users[0].user."client-certificate-data"' -r | base64 --decode > client.crt kubectl config view --raw -o json | jq '.users[0].user."client-key-data"' -r | base64 --decode > client.key export SERVER=$(kubectl config view --raw -o json | jq '.clusters[0].cluster.server' -r) export WORKER_MACHINE=zhsun-cas-r28fw-worker-us-east-2c-t576t curl -H "Content-Type: application/merge-patch+json" --cacert ./ca.crt --cert ./client.crt --key ./client.key $SERVER/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/$WORKER_MACHINE/status -XPATCH -d '{"status":{"phase":"Deleting"}}' 2. add workload $ oc create -f ~/data/scaleup-32.yaml deployment.apps/scale-up created $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE zhsun-cas-r28fw-worker-us-east-2a 3 1 1 1 10h zhsun-cas-r28fw-worker-us-east-2b 3 1 1 1 10h zhsun-cas-r28fw-worker-us-east-2c 3 1 1 1 10h $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun-cas-r28fw-master-0 Running m6i.xlarge us-east-2 us-east-2a 10h zhsun-cas-r28fw-master-1 Running m6i.xlarge us-east-2 us-east-2b 10h zhsun-cas-r28fw-master-2 Running m6i.xlarge us-east-2 us-east-2c 10h zhsun-cas-r28fw-worker-us-east-2a-5rvgv Running m6i.xlarge us-east-2 us-east-2a 134m zhsun-cas-r28fw-worker-us-east-2b-zn7gf Running m6i.xlarge us-east-2 us-east-2b 148m zhsun-cas-r28fw-worker-us-east-2c-t576t Deleting m6i.xlarge us-east-2 us-east-2c 72m $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler MachineSet zhsun-cas-r28fw-worker-us-east-2a 1 3 7h13m machineautoscalerb MachineSet zhsun-cas-r28fw-worker-us-east-2b 1 3 7h12m machineautoscalerc MachineSet zhsun-cas-r28fw-worker-us-east-2c 1 3 7h12m
Actual results:
Expected results:
Additional info:
- duplicates
-
OCPBUGS-11115 Autoscaler does not work after entering in failed status for a single machineautoscaler
- ASSIGNED
- is related to
-
OCPBUGS-11115 Autoscaler does not work after entering in failed status for a single machineautoscaler
- ASSIGNED