-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16, 4.17, 4.18
Description of problem:
The customer environment has an issue where some pods from a DaemonSet get stuck in a pending state after the MTO patch the nodeAffinity on pods. Even though it's due to the customer's cluster some of nodes have taints causing nodes unscheduling, the MTO shouldn't cause the pods to get stuck. If the customer's cluster autoscaler is enabled, this could trigger pod scale out and make the situation worse.
Version-Release number of selected component (if applicable):
multiarch-tuning-operator.v1.0.0(OCP version 4.16, 4.17)
How reproducible:
100%
Steps to Reproduce:
1. create a multiarch cluster with 2 amd64 and 1 arm64 workers 2. install mto and it's operand on cluster 3. create a daemonset with arm64 single arch image oc new-project test-mmo oc create -f - <<EOF apiVersion: apps/v1 kind: DaemonSet metadata: name: test-daemonset spec: selector: matchLabels: app: hello-openshift template: metadata: labels: app: hello-openshift spec: containers: - name: hello-openshift image: quay.io/openshifttest/hello-openshift:arm-1.2.0 EOF 4. check all pods status oc get pods -o wide
Actual results:
there are two pods get stuck after ppc patch arm64 nodeAffinity on pods
Expected results:
our operator should not make any pod from daemonset get stuck, if cluster autoscaler is enabled, will also triggle pod autoscale.
Additional info:
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update