Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-52859

NROP stays at progressing state if one of the node groups is pointing to empty MCP when RTEcustom policy is enable

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • 2025-10-15: in discussions to change this to a documentation bug across all relevant versions
    • None
    • CNF Compute Sprint 268, CNF Compute Sprint 269, CNF Compute Sprint 270, CNF Compute Sprint 271, CNF Compute Sprint 272
    • 5
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

        see steps please.
      
      

      Version-Release number of selected component (if applicable):

          so far >=4.18 

      How reproducible:

          always

      Steps to Reproduce:

          1.  Add a new mcp on a cluster that has NROP installed and deployed with custom RTE selinux policy enabled. The new MCP should stay empty. Add it as a new node group under NROP CR.     

      Actual results:
      although the new mcp is "updated", the NROP CR stays in progressing state and never available:

      NAME                                                             CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      machineconfigpool.machineconfiguration.openshift.io/master       rendered-master-e7416b38d7bdae0e580eb1578bc2400b       True      False      False      3              3                   3                     0                      14d
      machineconfigpool.machineconfiguration.openshift.io/worker       rendered-worker-6ef79ab49669b7064068cc58ecad9c01       False     True       False      2              0                   0                     0                      14d
      machineconfigpool.machineconfiguration.openshift.io/worker-cnf   rendered-worker-cnf-da770c3a1535f54e6ef8e6e8aac9a254   True      False      False      0              0                   0                     0                      31mNAME                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
      daemonset.apps/numaresourcesoperator-worker   2         2         2       2            2           node-role.kubernetes.io/worker=   73m
      shajmakh@shajmakh-thinkpadp16vgen1 ~ $ oc describe  numaresourcesoperator 
          
      Spec:
        Log Level:  Normal
        Node Groups:
          Machine Config Pool Selector:
            Match Labels:
              pools.operator.machineconfiguration.openshift.io/worker:  
          Machine Config Pool Selector:
            Match Labels:
              machineconfiguration.openshift.io/role:  worker-cnf
      Status:
        Conditions:
          Last Transition Time:  2025-03-10T15:45:07Z
          Message:               
          Reason:                Available
          Status:                False
          Type:                  Available
          Last Transition Time:  2025-03-10T15:45:07Z
          Message:               
          Reason:                Upgradeable
          Status:                False
          Type:                  Upgradeable
          Last Transition Time:  2025-03-10T15:45:07Z
          Message:               worker is updating
          Reason:                MachineConfigPoolIsUpdating
          Status:                True
          Type:                  Progressing
          Last Transition Time:  2025-03-10T15:45:07Z
          Message:               
          Reason:                Degraded
          Status:                False
          Type:                  Degraded
        Daemonsets:
          Name:       numaresourcesoperator-worker
          Namespace:  numaresources
        Machineconfigpools:
          Name:  worker
        Node Groups:
          Config:
            Info Refresh Mode:    Periodic
            Info Refresh Pause:   Disabled
            Info Refresh Period:  10s
            Pods Fingerprinting:  EnabledExclusiveResources
          Daemonsets:
            Name:       numaresourcesoperator-worker
            Namespace:  numaresources
          Selector:     worker
      
      
      
      

      Expected results:

          NROP should be available (create the rte ds; no pods are expected) and let the mco controller handle updates on mcps the nrop controller will still watch for updates there.

      Additional info:

          set severity as moderate considering the workaround is simply delete the empty mcp or remove the custom policy annotation; without the workaround it is considered a blocker. 
      
      when the custom policy annotation is removed, behavior is normal again:
      
      er=   86m
      shajmakh@shajmakh-thinkpadp16vgen1 ~ $ oc get node,mcp,ds
      NAME                                                 STATUS   ROLES                          AGE   VERSION
      node/cnfdr11.telco5g.eng.rdu2.redhat.com             Ready    worker                         14d   v1.31.5
      node/cnfdr9.telco5g.eng.rdu2.redhat.com              Ready    worker                         14d   v1.31.5
      node/dhcp-10-1-105-178.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   14d   v1.31.5
      node/dhcp-10-1-105-221.telco5g.eng.rdu2.redhat.com   Ready    control-plane,master,virtual   14d   v1.31.5
      node/dhcp-10-1-105-44.telco5g.eng.rdu2.redhat.com    Ready    control-plane,master,virtual   14d   v1.31.5NAME                                                             CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      machineconfigpool.machineconfiguration.openshift.io/master       rendered-master-e7416b38d7bdae0e580eb1578bc2400b       True      False      False      3              3                   3                     0                      14d
      machineconfigpool.machineconfiguration.openshift.io/worker       rendered-worker-da770c3a1535f54e6ef8e6e8aac9a254       True      False      False      2              2                   2                     0                      14d
      machineconfigpool.machineconfiguration.openshift.io/worker-cnf   rendered-worker-cnf-da770c3a1535f54e6ef8e6e8aac9a254   True      False      False      0              0                   0                     0                      49mNAME                                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                         AGE
      daemonset.apps/numaresourcesoperator-worker       2         2         2       2            2           node-role.kubernetes.io/worker=       92m
      daemonset.apps/numaresourcesoperator-worker-cnf   0         0         0       0            0           node-role.kubernetes.io/worker-cnf=   2m9s
      shajmakh@shajmakh-thinkpadp16vgen1 ~ $ 
      
      The behavior also occures when default selinux policy is controlling, the difference in the output would be "DaemonSetIsUpdating"  instead of "MachineConfigPoolIsUpdating"

       

              rhn-support-shajmakh Shereen Haj
              rhn-support-shajmakh Shereen Haj
              None
              None
              Roy Shemtov Roy Shemtov
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: