Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Node / Numa aware Scheduling
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None
Epic Link:
CNF-17300
Latest Status Summary:

Hide
Anytime the CR nodegroups include the worker group, all nodes with worker roles will have associated RTEs and NRTs, this also includes schedulable masters. so far the way to handle this is to define a node anti-affinity on the worker ds to exclude all other nodes' roles labels. The workaround is simple and after reassessment this can be considered p3 but not dropped completely.
We will probably handle this at RTE level

Show
Anytime the CR nodegroups include the worker group, all nodes with worker roles will have associated RTEs and NRTs, this also includes schedulable masters. so far the way to handle this is to define a node anti-affinity on the worker ds to exclude all other nodes' roles labels. The workaround is simple and after reassessment this can be considered p3 but not dropped completely. We will probably handle this at RTE level

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
CNF Compute Sprint 268, CNF Compute Sprint 269, CNF Compute Sprint 270, CNF Compute Sprint 271, CNF Compute Sprint 272, CNF Compute Sprint 273, CNF Compute Sprint 274, CNF Compute Sprint 275, CNF Compute Sprint 276, CNF Compute Sprint 277
sprint_count:
10

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    When configuring NROP CR with an MCP that has nodeselector that is shared between nodes from multiple MCPs, RTE pods start also on the non targeted nodes because the way RTE is configured according to the NROP controller is that it creates a daemonset with node selector similar to the one found in the specified MCP in the node group, then this DS creates RTE pods on all nodes that have this selector regardless whether they belong to same specified MCP or not.

Version-Release number of selected component (if applicable):

all

How reproducible:

   always

Steps to Reproduce:

setup:
- workers > 1    
- one of the workers have label node-role.kubernetes.io/worker-cnf: ""
- 2 mcps for worker nodes, one is the default worker and the other is called worker-cnf that targets nodes with label 
node-role.kubernetes.io/worker-cnf: ""     

1. configure NROP with node group with "worker" mcp selector

Actual results:

    RTE pods will ne created for all nodes that has the label that equals to "worker" mcp node selector

shajmakh@shajmakh-thinkpadp16vgen1 ~ $ oc get node,po
NAME                                                 STATUS   ROLES                          AGE   VERSION
node/cnfdr11.telco5g.eng.rdu2.redhat.com             Ready    worker,worker-cnf              23h   v1.31.5
node/cnfdr9.telco5g.eng.rdu2.redhat.com              Ready    worker                         23h   v1.31.5
node/dhcp-10-1-105-178.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   24h   v1.31.5
node/dhcp-10-1-105-221.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   24h   v1.31.5
node/dhcp-10-1-105-44.telco5g.eng.rdu2.redhat.com    Ready    control-plane,master,virtual   24h   v1.31.5
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/numaresources-controller-manager-cdf99bff5-cc4js   1/1     Running   0          22h
pod/numaresourcesoperator-worker-5wfr6                 2/2     Running   0          4h29m
pod/numaresourcesoperator-worker-x8vv9                 2/2     Running   0          4h29m

Expected results:

    NROP is expected to address machines that only belong to the MCP that is specified under the node group. we need a better way to identify the target nodes

Additional info:

    If the NRO CR is configured with the node pool "worker" and also another pool "XXX" (the other pool must have worker role otherwise it's against MCP rules), RTE pods will be doubled for nodes from pool "XXX". Below is an example of how this looks like with a 5-node cluster that has 3 schedulable control-plane and 2 workers:


shajmakh@shajmakh-thinkpadp16vgen1 ~/temp-el8 $ oc get pod -o wide
NAME                                                READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
numaresources-controller-manager-554c9dffcd-nx8sk   1/1     Running   0          20d     10.134.0.103   dhcp-1-105-44.telco5g.eng.rdu2.redhat.com    <none>           <none>
numaresourcesoperator-master-76qft                  2/2     Running   0          7s      10.133.0.48    dhcp-1-105-178.telco5g.eng.rdu2.redhat.com   <none>           2/2
numaresourcesoperator-master-7wgv6                  2/2     Running   0          7s      10.134.1.125   dhcp-1-105-44.telco5g.eng.rdu2.redhat.com    <none>           2/2
numaresourcesoperator-master-x6f97                  2/2     Running   0          7s      10.132.0.120   dhcp-1-105-134.telco5g.eng.rdu2.redhat.com   <none>           2/2
numaresourcesoperator-worker-9bdsh                  2/2     Running   4          20d     10.132.3.167   cnfdr11.telco5g.eng.rdu2.redhat.com          <none>           2/2
numaresourcesoperator-worker-9fr9x                  2/2     Running   0          2m12s   10.133.0.46    dhcp-1-105-178.telco5g.eng.rdu2.redhat.com   <none>           2/2
numaresourcesoperator-worker-bschw                  2/2     Running   0          2m12s   10.134.1.123   dhcp-1-105-44.telco5g.eng.rdu2.redhat.com    <none>           2/2
numaresourcesoperator-worker-flkm6                  2/2     Running   0          2m12s   10.132.0.118   dhcp-1-105-134.telco5g.eng.rdu2.redhat.com   <none>           2/2
numaresourcesoperator-worker-jfw69                  2/2     Running   4          20d     10.135.1.90    cnfdr9.telco5g.eng.rdu2.redhat.com           <none>           2/2
secondary-scheduler-7759995447-f6rsw                1/1     Running   0          3d2h    10.134.1.121   dhcp-1-105-44.telco5g.eng.rdu2.redhat.com    <none>           <none>
shajmakh@shajmakh-thinkpadp16vgen1 ~/temp-el8 $ oc get node
NAME                                         STATUS   ROLES                                 AGE   VERSION
cnfdr11.telco5g.eng.rdu2.redhat.com          Ready    worker                                28d   v1.31.9
cnfdr9.telco5g.eng.rdu2.redhat.com           Ready    worker                                28d   v1.31.9
dhcp-1-105-134.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual,worker   28d   v1.31.8
dhcp-1-105-178.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual,worker   28d   v1.31.8
dhcp-1-105-44.telco5g.eng.rdu2.redhat.com    Ready    control-plane,master,virtual,worker   28d   v1.31.8
shajmakh@shajmakh-thinkpadp16vgen1 ~/temp-el8 $ oc get numaresourcesoperator -o yaml 
apiVersion: v1
items:
- apiVersion: nodetopology.openshift.io/v1
  kind: NUMAResourcesOperator
  ...
  spec:
    logLevel: Trace
    nodeGroups:
    - poolName: worker
    - poolName: master

mentioned on

Merge request - CNF-19810 Add release artifacts for OCP low latency 4.20.0 GA

Assignee:: Shereen Haj

Reporter:: Shereen Haj

Need Info From:: None

Contributors:: None

QA Contact:: Roy Shemtov

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/02/25 12:59 PM

Updated:: 2025/10/22 10:32 AM

Resolved:: 2025/10/21 10:03 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide