Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-64864

NFD Operator Node Affinity Rules Incorrect

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The node affinity being used are not sufficient if any additional node labels are used.

      If you look at the code:
      https://github.com/openshift/cluster-nfd-operator/blob/master/internal/daemonset/worker.go#L30-L54

      It produces a config like:

      
            spec:
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                    - matchExpressions:
                      - key: node-role.kubernetes.io/master
                        operator: DoesNotExist
                    - matchExpressions:
                      - key: node-role.kubernetes.io/worker
                        operator: Exists
          

      There are 2 distinct 'matchExpressions'. These matchExpressions are not additive when defined as separate expressions. That means only one of them needs to be true. In the case of an infrastructure node, it has no worker or master label. The first expression is true and the second one looking for the worker label is longer relevant.

      https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

      If you specify multiple terms in nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified terms can be satisfied (terms are ORed).

      We would want them to be on the same matchExpressions, so they get ANDed.

      The pods then end up starting with:

        nodeSelector:
          node-role.kubernetes.io/worker: ""
      

      And can not start because there is in fact no worker label on those infra nodes.

      status:
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2025-11-06T21:46:02Z"
          message: '0/10 nodes are available: 1 node(s) didn''t match Pod''s node affinity/selector.
            preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.'
          reason: Unschedulable
          status: "False"
          type: PodScheduled
        phase: Pending
      

      Version-Release number of selected component (if applicable):

      4.16.0-202507211806
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Cluster that has master, work and infra labeled nodes
          2.
          3.
          

      Actual results:

      DS tries to put pods on the infra nodes, but they will sit pending due to the lack of a worker label.
          

      Expected results:

      Only run on worker nodes
          

      Additional info:

      
          

              yshnaidm Yevgeny Shnaidman
              rhn-support-mrobson Matt Robson
              None
              None
              Guy Gordani Guy Gordani
              None
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: