Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18.z, 4.19.0
Component/s: Node / Numa aware Scheduling
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None
Latest Status Summary:
2025-10-15: in discussions to change this to a documentation bug across all relevant versions

Target Backport Versions:

4.18.z
Target Version:

4.21
Release Blocker:
None
Sprint:
CNF Compute Sprint 268, CNF Compute Sprint 269, CNF Compute Sprint 270, CNF Compute Sprint 271, CNF Compute Sprint 272
sprint_count:
5

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

  see steps please.

Version-Release number of selected component (if applicable):

    so far >=4.18

How reproducible:

    always

Steps to Reproduce:

    1.  Add a new mcp on a cluster that has NROP installed and deployed with custom RTE selinux policy enabled. The new MCP should stay empty. Add it as a new node group under NROP CR.

Actual results:
although the new mcp is "updated", the NROP CR stays in progressing state and never available:

NAME                                                             CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master       rendered-master-e7416b38d7bdae0e580eb1578bc2400b       True      False      False      3              3                   3                     0                      14d
machineconfigpool.machineconfiguration.openshift.io/worker       rendered-worker-6ef79ab49669b7064068cc58ecad9c01       False     True       False      2              0                   0                     0                      14d
machineconfigpool.machineconfiguration.openshift.io/worker-cnf   rendered-worker-cnf-da770c3a1535f54e6ef8e6e8aac9a254   True      False      False      0              0                   0                     0                      31mNAME                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
daemonset.apps/numaresourcesoperator-worker   2         2         2       2            2           node-role.kubernetes.io/worker=   73m
shajmakh@shajmakh-thinkpadp16vgen1 ~ $ oc describe  numaresourcesoperator 
    
Spec:
  Log Level:  Normal
  Node Groups:
    Machine Config Pool Selector:
      Match Labels:
        pools.operator.machineconfiguration.openshift.io/worker:  
    Machine Config Pool Selector:
      Match Labels:
        machineconfiguration.openshift.io/role:  worker-cnf
Status:
  Conditions:
    Last Transition Time:  2025-03-10T15:45:07Z
    Message:               
    Reason:                Available
    Status:                False
    Type:                  Available
    Last Transition Time:  2025-03-10T15:45:07Z
    Message:               
    Reason:                Upgradeable
    Status:                False
    Type:                  Upgradeable
    Last Transition Time:  2025-03-10T15:45:07Z
    Message:               worker is updating
    Reason:                MachineConfigPoolIsUpdating
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2025-03-10T15:45:07Z
    Message:               
    Reason:                Degraded
    Status:                False
    Type:                  Degraded
  Daemonsets:
    Name:       numaresourcesoperator-worker
    Namespace:  numaresources
  Machineconfigpools:
    Name:  worker
  Node Groups:
    Config:
      Info Refresh Mode:    Periodic
      Info Refresh Pause:   Disabled
      Info Refresh Period:  10s
      Pods Fingerprinting:  EnabledExclusiveResources
    Daemonsets:
      Name:       numaresourcesoperator-worker
      Namespace:  numaresources
    Selector:     worker

Expected results:

    NROP should be available (create the rte ds; no pods are expected) and let the mco controller handle updates on mcps the nrop controller will still watch for updates there.

Additional info:

    set severity as moderate considering the workaround is simply delete the empty mcp or remove the custom policy annotation; without the workaround it is considered a blocker. 

when the custom policy annotation is removed, behavior is normal again:

er=   86m
shajmakh@shajmakh-thinkpadp16vgen1 ~ $ oc get node,mcp,ds
NAME                                                 STATUS   ROLES                          AGE   VERSION
node/cnfdr11.telco5g.eng.rdu2.redhat.com             Ready    worker                         14d   v1.31.5
node/cnfdr9.telco5g.eng.rdu2.redhat.com              Ready    worker                         14d   v1.31.5
node/dhcp-10-1-105-178.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   14d   v1.31.5
node/dhcp-10-1-105-221.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   14d   v1.31.5
node/dhcp-10-1-105-44.telco5g.eng.rdu2.redhat.com    Ready    control-plane,master,virtual   14d   v1.31.5NAME                                                             CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master       rendered-master-e7416b38d7bdae0e580eb1578bc2400b       True      False      False      3              3                   3                     0                      14d
machineconfigpool.machineconfiguration.openshift.io/worker       rendered-worker-da770c3a1535f54e6ef8e6e8aac9a254       True      False      False      2              2                   2                     0                      14d
machineconfigpool.machineconfiguration.openshift.io/worker-cnf   rendered-worker-cnf-da770c3a1535f54e6ef8e6e8aac9a254   True      False      False      0              0                   0                     0                      49mNAME                                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                         AGE
daemonset.apps/numaresourcesoperator-worker       2         2         2       2            2           node-role.kubernetes.io/worker=       92m
daemonset.apps/numaresourcesoperator-worker-cnf   0         0         0       0            0           node-role.kubernetes.io/worker-cnf=   2m9s
shajmakh@shajmakh-thinkpadp16vgen1 ~ $ 

The behavior also occures when default selinux policy is controlling, the difference in the output would be "DaemonSetIsUpdating"  instead of "MachineConfigPoolIsUpdating"

links to

https://github.com/openshift-kni/numaresources-operator/pull/2261

Assignee:: Shereen Haj

Reporter:: Shereen Haj

Need Info From:: None

Contributors:: None

QA Contact:: Roy Shemtov

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/03/10 3:58 PM

Updated:: 2026/01/06 10:50 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates