Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25599

Scaling 2 -> 6 nodes, resaults in 3 nodes only, although there are 3 more free agents

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.16.0
    • 4.14
    • HyperShift / Agent
    • None
    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          After scaling 6->2 nodes succesfully, trying to scale back 2->6 nodes ends with 3 nodes only, although there are still 3 agents to allocate 

      Version-Release number of selected component (if applicable):

          [kni@ocp-edge77 ~]$ oc version
      Client Version: 4.14.0-0.nightly-2023-12-16-101212
      Kustomize Version: v5.0.1
      
      
      [kni@ocp-edge77 ~]$ oc get hc -A
      NAMESPACE   NAME       VERSION   KUBECONFIG                  PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
      clusters    hosted-0   4.14.7    hosted-0-admin-kubeconfig   Completed   True        False         The hosted control plane is available
      
      
      

      How reproducible:

          happens sometimes

      Steps to Reproduce:

          1.deploye a hub cluster + hosted cluster with 6 agent providers workers.
      I've used this job for deploying :https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/job-runner/2111/console
          2.scale down 6->2 nodes by :
      (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc scale nodepool/hosted-0 --namespace clusters --kubeconfig 
      ~/clusterconfigs/auth/hub-kubeconfig --replicas=2
      nodepool.hypershift.openshift.io/hosted-0 scaled
      
          3. wait for 2 nodes in the nodepool :
      (oc get nodepool -n clusters)
      
          4. scale up 2->6 nodes by :
            (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc scale nodepool/hosted-0 --namespace clusters --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig --replicas=6
      nodepool.hypershift.openshift.io/hosted-0 scaled
         
          5. recreate the Ignition token by applying the workaround described here :https://github.com/stolostron/rhacm-docs/blob/2.9_stage/clusters/release_notes/known_issues.adoc#on-bare-metal-platforms-agent-resources-might-fail-to-pull-ignition
          5.1 oc delete secret agent-user-data-hosted-0-d3a75541 -n clusters-hosted-0 
          5.2 add a lable to one of the agents, by editing it:
      oc edit agent 030ae455-fe57-4029-9d73-c7fb8356b34f -n hosted-0 --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig
      
         

      Actual results:

          [kni@ocp-edge77 ~]$ oc get nodepool -n clusters
      NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
      hosted-0   hosted-0   6               3               False                      4.14.7                                       Minimum availability requires 6 replicas, current 3 available
      
      
      [kni@ocp-edge77 ~]$ oc get agents -n hosted-0
      NAME                                   CLUSTER    APPROVED   ROLE     STAGE
      030ae455-fe57-4029-9d73-c7fb8356b34f   hosted-0   true       worker   Done
      0a398b9f-90ab-44e5-8175-54f53b201089   hosted-0   true       worker   Done
      34b60917-7df2-46d8-96c7-455a797eec3c   hosted-0   true       worker   Done
      5bc7ff28-70c8-4d22-8b3e-747e4c7e69a7              true       worker   
      91074b73-56ec-4df2-83a8-5eaea2983771              true       worker   
      a9116103-87eb-4205-aeb3-f38ad466efb4              true       worker   
      [kni@ocp-edge77 ~]$ 
       

      Expected results:

           [kni@ocp-edge77 ~]$ oc get nodepool -n clusters NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE hosted-0   hosted-0   6               6               False                      4.14.7                                      
      
      all 6 agents are allocated to hosted-0, and not only 3 of them
      [kni@ocp-edge77 ~]$ oc get agents -n hosted-0 NAME                                   CLUSTER    APPROVED   ROLE     STAGE 030ae455-fe57-4029-9d73-c7fb8356b34f   hosted-0   true       worker   Done 0a398b9f-90ab-44e5-8175-54f53b201089   hosted-0   true       worker   Done 34b60917-7df2-46d8-96c7-455a797eec3c   hosted-0   true       worker   Done 5bc7ff28-70c8-4d22-8b3e-747e4c7e69a7              true       worker    91074b73-56ec-4df2-83a8-5eaea2983771              true       worker    a9116103-87eb-4205-aeb3-f38ad466efb4              true       worker    [kni@ocp-edge77 ~]$     

      Additional info:

          

       

            atraeger Avishay Traeger
            rhn-support-gamado Gal Amado
            Gal Amado Gal Amado
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: