-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.17
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
CLOUD Sprint 257, CLOUD Sprint 258, CLOUD Sprint 259
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
TODO: (Placeholder for now)
Description of problem:
The CAO can get into a failed state:
2023-03-22T12:46:49.148733289Z E0322 12:46:49.148726 1 static_autoscaler.go:364] Failed to fix node group sizes: failed to decrease MachineSet/openshift-machine-api/eu-3-compute-kgzn2-aro-machineset-compute-xl-germanywestcentral-1: attempt to delete existing nodes targetSize:4 delta:-1 existingNodes: 6
Version-Release number of selected component (if applicable):
4.16
How reproducible:
Yes
Steps to Reproduce:
oc scale deployment cluster-version-operator -n openshift-cluster-version --replicas=0
oc scale deployment machine-api-operator --replicas=0
oc scale deployment machine-api-controllers --replicas=0
kubectl config view --raw -o json | jq '.clusters[0].cluster."certificate-authority-data"' -r | base64 --decode > ca.crt
kubectl config view --raw -o json | jq '.users[0].user."client-certificate-data"' -r | base64 --decode > client.crt
kubectl config view --raw -o json | jq '.users[0].user."client-key-data"' -r | base64 --decode > client.key
export SERVER=$(kubectl config view --raw -o json | jq '.clusters[0].cluster.server' -r)
export WORKER_MACHINE=zhsun-cas-r28fw-worker-us-east-2c-t576t
curl -H "Content-Type: application/merge-patch+json" --cacert ./ca.crt --cert ./client.crt --key ./client.key $SERVER/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/$WORKER_MACHINE/status -XPATCH -d '{"status":{"phase":"Deleting"}}'
2. add workload
$ oc create -f ~/data/scaleup-32.yaml
deployment.apps/scale-up created
$ oc get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
zhsun-cas-r28fw-worker-us-east-2a 3 1 1 1 10h
zhsun-cas-r28fw-worker-us-east-2b 3 1 1 1 10h
zhsun-cas-r28fw-worker-us-east-2c 3 1 1 1 10h
$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
zhsun-cas-r28fw-master-0 Running m6i.xlarge us-east-2 us-east-2a 10h
zhsun-cas-r28fw-master-1 Running m6i.xlarge us-east-2 us-east-2b 10h
zhsun-cas-r28fw-master-2 Running m6i.xlarge us-east-2 us-east-2c 10h
zhsun-cas-r28fw-worker-us-east-2a-5rvgv Running m6i.xlarge us-east-2 us-east-2a 134m
zhsun-cas-r28fw-worker-us-east-2b-zn7gf Running m6i.xlarge us-east-2 us-east-2b 148m
zhsun-cas-r28fw-worker-us-east-2c-t576t Deleting m6i.xlarge us-east-2 us-east-2c 72m
$ oc get machineautoscaler
NAME REF KIND REF NAME MIN MAX AGE
machineautoscaler MachineSet zhsun-cas-r28fw-worker-us-east-2a 1 3 7h13m
machineautoscalerb MachineSet zhsun-cas-r28fw-worker-us-east-2b 1 3 7h12m
machineautoscalerc MachineSet zhsun-cas-r28fw-worker-us-east-2c 1 3 7h12m
Actual results:
Expected results:
Additional info:
- duplicates
-
OCPBUGS-11115 Autoscaler does not work after entering in failed status for a single machineautoscaler
-
- Closed
-
- is related to
-
OCPBUGS-11115 Autoscaler does not work after entering in failed status for a single machineautoscaler
-
- Closed
-