Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17.z
Component/s: Node / Numa aware Scheduling
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    appending new node groups in a settled NROP CR will trigger reboot on nodes of the old node group.

Version-Release number of selected component (if applicable):

    <=4.17

How reproducible:

    always

Steps to Reproduce:

    1.create new mcp without any machines attached to it
    2.append the new MCP selector as a new nodegroup 
    2.watch the original node group nodes being rebooted
    3.

Actual results:

nodes get rebooted although update is not affecting them

Expected results:

    node groups shouldn't affect each other

Additional info:

    shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$ oc get pod,ds,node,mcp 
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/numaresources-controller-manager-c8d4b77bf-fvwtl   1/1     Running   0          2d
pod/numaresourcesoperator-worker-25z47                 2/2     Running   2          12m
pod/numaresourcesoperator-worker-7s7xr                 2/2     Running   0          11m
pod/secondary-scheduler-755b8f4979-jqtc4               1/1     Running   0          45hNAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                       AGE
daemonset.apps/numaresourcesoperator-mcp-test   0         0         0       0            0           node-role.kubernetes.io/mcp-test=   12m
daemonset.apps/numaresourcesoperator-worker     1         1         1       1            1           node-role.kubernetes.io/worker=     2dNAME            STATUS                        ROLES                  AGE    VERSION
node/master-0   Ready                         control-plane,master   2d1h   v1.30.10
node/master-1   Ready                         control-plane,master   2d1h   v1.30.10
node/master-2   Ready                         control-plane,master   2d1h   v1.30.10
node/worker-0   Ready                         worker                 2d     v1.30.10
node/worker-1   NotReady,SchedulingDisabled   worker                 2d     v1.30.10NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-7c77ad8b357395c1ea39c4787caa15e8   True      False      False      3              3                   3                     0                      2d1h
machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-f3d740589822ab9eb433abadf74da6a8   False     True       False      2              1                   1                     0                      2d1h
shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$ 


$ oc describe   numaresourcesoperator 
Name:         numaresourcesoperator
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  nodetopology.openshift.io/v1
Kind:         NUMAResourcesOperator
Metadata:
  Creation Timestamp:  2025-03-09T10:54:30Z
  Generation:          17
  Resource Version:    1081504
  UID:                 e086aabb-2645-4a54-910c-3ae59e762c7c
Spec:
  Log Level:  Trace
  Node Groups:
    Config:
      Info Refresh Pause:  Disabled
    Machine Config Pool Selector:
      Match Labels:
        pools.operator.machineconfiguration.openshift.io/worker:  
Status:
  Conditions:
    Last Transition Time:  2025-03-11T11:03:54Z
    Message:               
    Reason:                Available
    Status:                False
    Type:                  Available
    Last Transition Time:  2025-03-11T11:03:54Z
    Message:               
    Reason:                Upgradeable
    Status:                False
    Type:                  Upgradeable
    Last Transition Time:  2025-03-11T11:03:54Z
    Message:               
    Reason:                Progressing
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2025-03-11T11:03:54Z
    Message:               
    Reason:                Degraded
    Status:                False
    Type:                  Degraded
  Daemonsets:
    Name:       numaresourcesoperator-worker
    Namespace:  openshift-numaresources
  Machineconfigpools:
    Name:  worker
    Name:  mcp-test
  Related Objects:
    Group:      
    Name:       openshift-numaresources
    Resource:   namespaces
    Group:      machineconfiguration.openshift.io
    Name:       
    Resource:   kubeletconfigs
    Group:      machineconfiguration.openshift.io
    Name:       
    Resource:   machineconfigs
    Group:      topology.node.k8s.io
    Name:       
    Resource:   noderesourcetopologies
    Group:      apps
    Name:       numaresourcesoperator-worker
    Namespace:  openshift-numaresources
    Resource:   daemonsets
Events:
  Type    Reason                Age                  From                      Message
  ----    ------                ----                 ----                      -------
  Normal  SuccessfulRTECreate   11m (x77 over 2d)    numaresources-controller  Created Resource-Topology-Exporter DaemonSets
  Normal  SuccessfulMCSync      7m1s (x180 over 2d)  numaresources-controller  Enabled machine configuration for worker nodes
  Normal  SuccessfulCRDInstall  60s (x186 over 2d)   numaresources-controller  Node Resource Topology CRD installed
shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$

is caused by

OCPBUGS-53153 MCO requires reboot on worker nodes when creating new MC (even for MCP with zero machine count)

mentioned on

Merge request - Updated US source to: 1c9d202 Merge pull request #1252 from SargunNarula/test_fix

Merge request - Updated US source to: 193c449 Merge pull request #1249 from SargunNarula/typo_fix

Merge request - Updated US source to: 9696c5a Merge pull request #1196 from shajmakh/replace-47674

Assignee:: Talor Itzhak

Reporter:: Shereen Haj

Need Info From:: None

Contributors:: None

QA Contact:: Mallapadi Niranjan

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/03/11 11:23 AM

Updated:: 2025/07/16 9:47 AM

Resolved:: 2025/07/16 9:47 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide