Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Minor
Fix Version/s: 4.15.0
Affects Version/s: 4.14.0
Component/s: Cloud Compute / Cluster Autoscaler
Labels:
- multiarch-qe
- pre-merge-tested

Severity:
Low
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.15.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref MIXEDARCH-129).

The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33

This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.

In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the MIXEDARCH-129 works, otherwise relying to the control plane architecture.

Version-Release number of selected component (if applicable):

- ARM64 IPI on GCP 4.14
- ARM64 IPI on Aws and Azure <=4.13
- In general, non-amd64 single-arch clusters supporting autoscale from 0

How reproducible:

Always

Steps to Reproduce:

1. Create an arm64 IPI cluster on GCP
2. Set one of the machinesets to have 0 replicas: 
    oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f
3. Deploy the default autoscaler
4. Deploy the machine autoscaler for the given machineset
5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.

Actual results:

From the pod events: 

pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector

Expected results:

The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.

Additional info:

---
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec: {}
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: worker-us-east-1a
  namespace: openshift-machine-api
spec:
  minReplicas: 0
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: adistefa-a1-zn8pg-worker-f
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: openshift-machine-api
  name: 'my-deployment'
  annotations: {}
spec:
  selector:
    matchLabels:
      app: name
  replicas: 3
  template:
    metadata:
      labels:
        app: name
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/arch
                  operator: In
                  values:
                    - "arm64"
      containers:
        - name: container
          image: >-
            image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env: []
          resources:
              requests:
                cpu: "2"
      imagePullSecrets: []
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  paused: false

blocks

OCPBUGS-19697 [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

Closed

is cloned by

OCPBUGS-19697 [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

Closed

links to

openshift/cluster-autoscaler-operator#284: OCPBUGS-18137: Provide the architecture of the control plane as argument to --scale-up-from-zero-default-arch

openshift/kubernetes-autoscaler#261: OCPBUGS-18137: UPSTREAM: 6066: Allow overriding the kubernetes.io/arch label set by the scale from zero methods via a new cmdline arg

RHEA-2023:7198 rpm

Assignee:: Alessandro Di Stefano

Reporter:: Alessandro Di Stefano

QA Contact:: Zhaohua Sun

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/08/25 5:10 PM

Updated:: 2024/02/27 8:50 PM

Resolved:: 2024/02/27 8:50 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates