Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-52958

multiple NROP node groups trigger unneeded nodes reboot on the not affected node groups

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          appending new node groups in a settled NROP CR will trigger reboot on nodes of the old node group.

      Version-Release number of selected component (if applicable):

          <=4.17

      How reproducible:

          always

      Steps to Reproduce:

          1.create new mcp without any machines attached to it
          2.append the new MCP selector as a new nodegroup 
          2.watch the original node group nodes being rebooted
          3.
          

      Actual results:

      nodes get rebooted although update is not affecting them

      Expected results:

          node groups shouldn't affect each other    

      Additional info:

          shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$ oc get pod,ds,node,mcp 
      NAME                                                   READY   STATUS    RESTARTS   AGE
      pod/numaresources-controller-manager-c8d4b77bf-fvwtl   1/1     Running   0          2d
      pod/numaresourcesoperator-worker-25z47                 2/2     Running   2          12m
      pod/numaresourcesoperator-worker-7s7xr                 2/2     Running   0          11m
      pod/secondary-scheduler-755b8f4979-jqtc4               1/1     Running   0          45hNAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                       AGE
      daemonset.apps/numaresourcesoperator-mcp-test   0         0         0       0            0           node-role.kubernetes.io/mcp-test=   12m
      daemonset.apps/numaresourcesoperator-worker     1         1         1       1            1           node-role.kubernetes.io/worker=     2dNAME            STATUS                        ROLES                  AGE    VERSION
      node/master-0   Ready                         control-plane,master   2d1h   v1.30.10
      node/master-1   Ready                         control-plane,master   2d1h   v1.30.10
      node/master-2   Ready                         control-plane,master   2d1h   v1.30.10
      node/worker-0   Ready                         worker                 2d     v1.30.10
      node/worker-1   NotReady,SchedulingDisabled   worker                 2d     v1.30.10NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-7c77ad8b357395c1ea39c4787caa15e8   True      False      False      3              3                   3                     0                      2d1h
      machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-f3d740589822ab9eb433abadf74da6a8   False     True       False      2              1                   1                     0                      2d1h
      shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$ 
      
      
      $ oc describe   numaresourcesoperator 
      Name:         numaresourcesoperator
      Namespace:    
      Labels:       <none>
      Annotations:  <none>
      API Version:  nodetopology.openshift.io/v1
      Kind:         NUMAResourcesOperator
      Metadata:
        Creation Timestamp:  2025-03-09T10:54:30Z
        Generation:          17
        Resource Version:    1081504
        UID:                 e086aabb-2645-4a54-910c-3ae59e762c7c
      Spec:
        Log Level:  Trace
        Node Groups:
          Config:
            Info Refresh Pause:  Disabled
          Machine Config Pool Selector:
            Match Labels:
              pools.operator.machineconfiguration.openshift.io/worker:  
      Status:
        Conditions:
          Last Transition Time:  2025-03-11T11:03:54Z
          Message:               
          Reason:                Available
          Status:                False
          Type:                  Available
          Last Transition Time:  2025-03-11T11:03:54Z
          Message:               
          Reason:                Upgradeable
          Status:                False
          Type:                  Upgradeable
          Last Transition Time:  2025-03-11T11:03:54Z
          Message:               
          Reason:                Progressing
          Status:                True
          Type:                  Progressing
          Last Transition Time:  2025-03-11T11:03:54Z
          Message:               
          Reason:                Degraded
          Status:                False
          Type:                  Degraded
        Daemonsets:
          Name:       numaresourcesoperator-worker
          Namespace:  openshift-numaresources
        Machineconfigpools:
          Name:  worker
          Name:  mcp-test
        Related Objects:
          Group:      
          Name:       openshift-numaresources
          Resource:   namespaces
          Group:      machineconfiguration.openshift.io
          Name:       
          Resource:   kubeletconfigs
          Group:      machineconfiguration.openshift.io
          Name:       
          Resource:   machineconfigs
          Group:      topology.node.k8s.io
          Name:       
          Resource:   noderesourcetopologies
          Group:      apps
          Name:       numaresourcesoperator-worker
          Namespace:  openshift-numaresources
          Resource:   daemonsets
      Events:
        Type    Reason                Age                  From                      Message
        ----    ------                ----                 ----                      -------
        Normal  SuccessfulRTECreate   11m (x77 over 2d)    numaresources-controller  Created Resource-Topology-Exporter DaemonSets
        Normal  SuccessfulMCSync      7m1s (x180 over 2d)  numaresources-controller  Enabled machine configuration for worker nodes
        Normal  SuccessfulCRDInstall  60s (x186 over 2d)   numaresources-controller  Node Resource Topology CRD installed
      shajmakh@shajmakh-thinkpadp16vgen1 ~/ghrepo/numaresources-operator (replace-47674)$ 
      
      

              titzhak Talor Itzhak
              rhn-support-shajmakh Shereen Haj
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: