Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32036

Pool degraded when rendered configs are pruned after several MCs are applied without waiting for the MCP to be updated

XMLWordPrintable

    • Moderate
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When we apply a new MC when a pool has not already applied a previous MC, and we prune the rendered MCs, the pool is degraded with this error:
      
        - lastTransitionTime: "2024-04-10T11:29:17Z"
          message: 'Node sregidor-voc1-r79pg-worker-a-jtg6r is reporting: "missing MachineConfig
            rendered-worker-9a036927fe6e36f4285a87683fa9d66b\nmachineconfig.machineconfiguration.openshift.io
            \"rendered-worker-9a036927fe6e36f4285a87683fa9d66b\" not found"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded
      
      
      
      

      Version-Release number of selected component (if applicable):

      pre-merge testing: https://github.com/openshift/oc/pull/1723
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Create a MC
          2. Once the MC has been applied to the first node (but it is not applied to the second node) create a new MC
          3. Execute the command: oc adm prune  renderedmachineconfigs --confirm  
      
      
          

      Actual results:

      
      What happens is that the rendered MC in step 2 is created but it is not used by any pool (because we are still waiting for the second node to be configured), hence it is pruned. Once the second node configuration is finished, MCO tries to use the rendered MC that was created in step 2, but it has been pruned in step 3 and the pool becomes degraded
      
          

      Expected results:

      No pool should be degraded when we execute the "oc adm prune" command.
          

      Additional info:

      I described here the steps that I used to reproduce the issue, but it is probably easier to reproduce it with more than 2 nodes, because we don't have to pay much attention to the timing when pruning.
      
      In order to recover the pool we need to:
      1. Remove the last MC
      2. Edit the desiredConfig in the degraded node
      3. Edit the .spec.configuration.name value in the degraded pool
      
          

            rhn-support-cruhm Courtney Ruhm
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: