Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77842

While upgrading the HostedCluster,the NodePool is being recreated two times one during the HC upgrade and another one during nodepool upgrade

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.17, 4.18
    • HyperShift
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

       While upgrading the HostedCluster from 4.17.29 to 4.18.30/33, observed that the NodePool is being recreated during the HC upgrade itself, even before the actual NodePool upgrade begins. The NodePool reports a configuration change as the reason. As a result, customers are seeing two NodePool recycles during every upgrade:  

      Version-Release number of selected component (if applicable):

       MCE 2.9

      How reproducible:

       Unable to reproduce apart from customer clusters

      Find the details below:

      1-  The customer is patching both the HostedCluster (HC) and the NodePool simultaneously, and the NodePool is waiting for the HostedCluster upgrade to complete. Please see the status below:

       - lastTransitionTime: "2026-03-04T08:28:04Z"
          message: 'Failed to get release image: the latest version supported is: "4.17.29".
            Attempting to use: "4.18.33"' 

      2- While the HostedCluster upgrade is in progress—specifically at the stage where all ClusterOperators have been upgraded to the desired version and it is waiting for the Network ClusterOperator—the NodePool begins undergoing a configuration change.

      % oc get nodepool -n clusters xxx
      NAME              CLUSTER           DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
      xxx   xxx   2               2               False         True         4.17.29   False             True
      
      $ oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console                                    4.18.33   True        False         False      25d
      csi-snapshot-controller                    4.18.33   True        False         False      14m
      dns                                        4.18.33   True        False         False      12m
      image-registry                             4.18.33   True        False         False      14m
      ingress                                    4.18.33   True        False         False      13m
      insights                                   4.18.33   True        False         False      145d
      kube-apiserver                             4.18.33   True        False         False      160d
      kube-controller-manager                    4.18.33   True        False         False      160d
      kube-scheduler                             4.18.33   True        False         False      160d
      kube-storage-version-migrator              4.18.33   True        False         False      11m
      monitoring                                 4.18.33   True        False         False      160d
      network                                    4.17.29   True        True          False      160d    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" update is rolling out (2 out of 3 updated)...
      node-tuning                                4.18.33   True        True          False      5m22s   Waiting for 1/3 Profiles to be applied
      openshift-apiserver                        4.18.33   True        False         False      160d
      openshift-controller-manager               4.18.33   True        False         False      160d
      openshift-samples                          4.18.33   True        False         False      160d
      operator-lifecycle-manager                 4.18.33   True        False         False      160d
      operator-lifecycle-manager-catalog         4.18.33   True        False         False      160d
      operator-lifecycle-manager-packageserver   4.18.33   True        False         False      160d
      service-ca                                 4.18.33   True        False         False      160d
      storage                                    4.18.33   True        False         False      160d
      
       

      3- At the same time, the NodePool currentConfig is pointing to a secret that was created before the upgrade. The NodePool initially reports that a new secret cannot be found (which appears to be a transient error). Later, the new secret is created, and the NodePool then proceeds with the config change.

      # oc get nodepool <name> -o yaml
      
      hypershift.openshift.io/nodePoolCurrentConfig: 72e6e8e8     hypershift.openshift.io/nodePoolCurrentConfigVersion: 1d346cfe 
      
      below are the secret which pointing to currentconfig
      
      token-xxx-1d346cfe                        Opaque                           9      96d
      user-data-xxx-1d346cfe                    Opaque                           2      96d
      user-data-xxx-1d346cfe-userdata           cluster.x-k8s.io/secret          1      33d
      
      Nodepool started complains below, once config updates become "True" dueing hc upgrade
      
       - lastTransitionTime: "2026-03-04T08:28:04Z"
          message: Secret "token-xxx-69b84919" not found
          observedGeneration: 5
          reason: NotFound
          status: "False"
          type: ReachedIgnitionEndpoint
        - lastTransitionTime: "2026-03-04T08:28:04Z"
          message: 'Updating config in progress. Target config: 1c4e6609'
          observedGeneration: 5
          reason: AsExpected
          status: "True"
          type: UpdatingConfig
        - lastTransitionTime: "2025-09-24T11:32:35Z"
          observedGeneration: 5
          reason: AsExpected
          status: "False"
          type: UpdatingVersion
        - lastTransitionTime: "2026-03-04T08:28:04Z"
          message: Secret "token-xxx-69b84919" not found
          observedGeneration: 5
          reason: NotFound
          status: "False"
          type: ValidGeneratedPayload
      
      later this secret recognised and started the config update
      
      token-xxx-69b84919                        Opaque                           8      18m
      user-data-xxx-69b84919                    Opaque                           2      18m
      user-data-xxx-69b84919-userdata           cluster.x-k8s.io/secret          1      18m

      4- After the above configuration change is completed, the NodePool nodes are recreated again as part of the NodePool upgrade.

      Please help us understand why the NodePool nodes are being recreated during the HostedCluster upgrade, even before the NodePool upgrade begins.

      Actual results:

       Nodepools are getting recreated twice during hypershift upgrade

      Expected results:

        Nodepools should be recreated only once during hypershift upgrade

      Additional info:

       

              Unassigned Unassigned
              rhn-support-amuhamme MUHAMMED ASLAM V K
              Yu Li Yu Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: