Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-9458

klusterlet add-ons does not tolerate Infra taints NoExecute (cluster-proxy,work-manager)

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Done
    • Icon: Normal Normal
    • ACM 2.12.0
    • ACM 2.10.0, ACM 2.9.0, ACM 2.11.0, ACM 2.12.0
    • Server Foundation
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Feature Overview

      klusterlet add-ons does not tolerate Infra taints NoExecute (cluster-proxy,work-manager) 

       

      Version

      • RHACM 2.9
      • OCP 4.14.z

       

      Steps to Reproduce:

      $ oc describe po cluster-proxy-service-proxy-12345678abc | grep -i toleration -A8
      Tolerations:                 node-role.kubernetes.io/infra=reserved:NoExecute
                                   node-role.kubernetes.io/infra=reserved:NoSchedule
                                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:                      <none>

      And here are the taints of an infranode:
      $ oc describe node examplenode-infra1 | grep -i taint -A5
      Taints:             node-role.kubernetes.io/infra=reserved:NoExecute
                          node.kubernetes.io/unreachable:NoExecute
                          node-role.kubernetes.io/infra=reserved:NoSchedule
                          node.kubernetes.io/unreachable:NoSchedule
      Unschedulable:      false

       

      Pod is stuck pending, pod event error message, "0/11 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) didn't match Pod's node affinity/selector, 4 node(s) had untolerated taint {node-role.kubernetes.io/infra: reserved}. preemption: 0/11 nodes are available: 11 Preemption is not helpful for scheduling.."

       

      After remove node Taint "node.kubernetes.io/unreachable:NoExecute" (and remove Toleration "node-role.kubernetes.io/infra=reserved:NoExecute"), the pod scheduled and running successfully.

      Goals

       

      Enable klusterlet add-ons support tolerate with Infra taints NoExecute

       

       

      Documentation Considerations

      • In our Openshift KCS suggested infra node taint "NoExecute" to prevent other workloads from also being scheduled on those infrastructure nodes

      https://access.redhat.com/solutions/5034771 

      • Our RHACM document only mentioned infra node taint "node-role.kubernetes.io/infra=reserved:NoSchedule" which works without "NoExecute"

              leyan@redhat.com Le Yang
              rhn-support-rpu Rui Pu
              Hui Chen Hui Chen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: