-
Bug
-
Resolution: Done-Errata
-
Minor
-
4.14.0
-
Low
-
No
-
False
-
Description of problem:
When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref MIXEDARCH-129).
The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33
This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.
In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the MIXEDARCH-129 works, otherwise relying to the control plane architecture.
Version-Release number of selected component (if applicable):
- ARM64 IPI on GCP 4.14 - ARM64 IPI on Aws and Azure <=4.13 - In general, non-amd64 single-arch clusters supporting autoscale from 0
How reproducible:
Always
Steps to Reproduce:
1. Create an arm64 IPI cluster on GCP 2. Set one of the machinesets to have 0 replicas: oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f 3. Deploy the default autoscaler 4. Deploy the machine autoscaler for the given machineset 5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.
Actual results:
From the pod events: pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
Expected results:
The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.
Additional info:
--- apiVersion: autoscaling.openshift.io/v1 kind: ClusterAutoscaler metadata: name: default spec: {} --- apiVersion: autoscaling.openshift.io/v1beta1 kind: MachineAutoscaler metadata: name: worker-us-east-1a namespace: openshift-machine-api spec: minReplicas: 0 maxReplicas: 12 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: adistefa-a1-zn8pg-worker-f --- apiVersion: apps/v1 kind: Deployment metadata: namespace: openshift-machine-api name: 'my-deployment' annotations: {} spec: selector: matchLabels: app: name replicas: 3 template: metadata: labels: app: name spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - "arm64" containers: - name: container image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest ports: - containerPort: 8080 protocol: TCP env: [] resources: requests: cpu: "2" imagePullSecrets: [] strategy: type: RollingUpdate rollingUpdate: maxSurge: 25% maxUnavailable: 25% paused: false
- blocks
-
OCPBUGS-19697 [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set
- Closed
- is cloned by
-
OCPBUGS-19697 [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set
- Closed
- links to
-
RHEA-2023:7198 rpm