Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55421

The new master is created in another zone when deleting a master which caused these master machines are not balanced across multiple zones

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          The new master is created in another zone when deleting a master which caused these master machines are not balanced across multiple zones

      Version-Release number of selected component (if applicable):

          4.19.0-0.ci-2025-04-27-051101

      How reproducible:

          met twice

      Steps to Reproduce:

          1.Install a 4.19 AWS cluster
      sh-5.1$ oc get clusterversion
      NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.19.0-0.ci-2025-04-27-051101   True        False         144m    Cluster version is 4.19.0-0.ci-2025-04-27-051101
      sh-5.1$ oc get machine
      NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
      ci-op-j5wr5nh8-bae1c-8qb4c-master-0                  Running   m6i.xlarge   us-east-1   us-east-1d   170m
      ci-op-j5wr5nh8-bae1c-8qb4c-master-1                  Running   m6i.xlarge   us-east-1   us-east-1a   170m
      ci-op-j5wr5nh8-bae1c-8qb4c-master-2                  Running   m6i.xlarge   us-east-1   us-east-1c   170m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8   Running   m5.xlarge    us-east-1   us-east-1a   166m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx   Running   m5.xlarge    us-east-1   us-east-1c   166m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp   Running   m5.xlarge    us-east-1   us-east-1d   166m
      
          2.Edit the controlplanemachineset, add machineNamePrefix, for example, machineNamePrefix: jason-born 
      sh-5.1$ oc edit controlplanemachineset
      controlplanemachineset.machine.openshift.io/cluster edited
      
          3.Delete master-0 to get the new name take effect, I see the new machine created in another zone
      sh-5.1$ oc delete machine ci-op-j5wr5nh8-bae1c-8qb4c-master-0 
      machine.machine.openshift.io "ci-op-j5wr5nh8-bae1c-8qb4c-master-0" deleted
      ^Csh-5.1$ oc get machine
      NAME                                                 PHASE          TYPE         REGION      ZONE         AGE
      ci-op-j5wr5nh8-bae1c-8qb4c-master-0                  Deleting       m6i.xlarge   us-east-1   us-east-1d   173m
      ci-op-j5wr5nh8-bae1c-8qb4c-master-1                  Running        m6i.xlarge   us-east-1   us-east-1a   173m
      ci-op-j5wr5nh8-bae1c-8qb4c-master-2                  Running        m6i.xlarge   us-east-1   us-east-1c   173m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8   Running        m5.xlarge    us-east-1   us-east-1a   170m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx   Running        m5.xlarge    us-east-1   us-east-1c   170m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp   Running        m5.xlarge    us-east-1   us-east-1d   170m
      jason-born-z8cx9-0                                   Provisioning   m6i.xlarge   us-east-1   us-east-1c   4s
      sh-5.1$ oc get machine
      NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
      ci-op-j5wr5nh8-bae1c-8qb4c-master-1                  Running   m6i.xlarge   us-east-1   us-east-1a   3h26m
      ci-op-j5wr5nh8-bae1c-8qb4c-master-2                  Running   m6i.xlarge   us-east-1   us-east-1c   3h26m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8   Running   m5.xlarge    us-east-1   us-east-1a   3h22m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx   Running   m5.xlarge    us-east-1   us-east-1c   3h22m
      ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp   Running   m5.xlarge    us-east-1   us-east-1d   3h22m
      jason-born-z8cx9-0                                   Running   m6i.xlarge   us-east-1   us-east-1c   32m
      
      So at this time, the master machines are not balanced across multiple zones. Because in the controlplanemachineset it's us-east-1c  us-east-1d  us-east-1a , but two masters in us-east-1c , one master in us-east-1a .
      
              failureDomains:
                aws:
                - placement:
                    availabilityZone: us-east-1c
                  subnet:
                    id: subnet-024e2eedaa754b0ff
                    type: ID
                - placement:
                    availabilityZone: us-east-1d
                  subnet:
                    id: subnet-0424a84fcc457a01a
                    type: ID
                - placement:
                    availabilityZone: us-east-1a
                  subnet:
                    id: subnet-02b02e5d0afe52f86
                    type: ID
                platform: AWS 
      
      
      
      The other time I met this is, there were only two failure domains in the control plane machine set. After adding machineNamePrefix, I deleted one master and the new master created in another zone. At this time, all the three masters were in one zone.
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                          PHASE     TYPE         REGION      ZONE         AGE
      huliu-aws427b-ftphz-master-0                  Running   m6i.xlarge   us-east-2   us-east-2a   100m
      huliu-aws427b-ftphz-master-1                  Running   m6i.xlarge   us-east-2   us-east-2b   100m
      huliu-aws427b-ftphz-master-2                  Running   m6i.xlarge   us-east-2   us-east-2a   100m
      huliu-aws427b-ftphz-worker-us-east-2a-785cp   Running   m6i.xlarge   us-east-2   us-east-2a   96m
      huliu-aws427b-ftphz-worker-us-east-2a-h6tml   Running   m6i.xlarge   us-east-2   us-east-2a   96m
      huliu-aws427b-ftphz-worker-us-east-2b-zlxdr   Running   m6i.xlarge   us-east-2   us-east-2b   96m
      liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-aws427b-ftphz-master-1
      machine.machine.openshift.io "huliu-aws427b-ftphz-master-1" deleted
      ^C
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine                                
      NAME                                          PHASE          TYPE         REGION      ZONE         AGE
      deep-seek.example.com-bf5z4-1                 Provisioning   m6i.xlarge   us-east-2   us-east-2a   10s
      huliu-aws427b-ftphz-master-0                  Running        m6i.xlarge   us-east-2   us-east-2a   100m
      huliu-aws427b-ftphz-master-1                  Deleting       m6i.xlarge   us-east-2   us-east-2b   100m
      huliu-aws427b-ftphz-master-2                  Running        m6i.xlarge   us-east-2   us-east-2a   100m
      huliu-aws427b-ftphz-worker-us-east-2a-785cp   Running        m6i.xlarge   us-east-2   us-east-2a   96m
      huliu-aws427b-ftphz-worker-us-east-2a-h6tml   Running        m6i.xlarge   us-east-2   us-east-2a   96m
      huliu-aws427b-ftphz-worker-us-east-2b-zlxdr   Running        m6i.xlarge   us-east-2   us-east-2b   96m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                          PHASE     TYPE         REGION      ZONE         AGE
      deep-seek.example.com-bf5z4-1                 Running   m6i.xlarge   us-east-2   us-east-2a   15m
      huliu-aws427b-ftphz-master-0                  Running   m6i.xlarge   us-east-2   us-east-2a   116m
      huliu-aws427b-ftphz-master-2                  Running   m6i.xlarge   us-east-2   us-east-2a   116m
      huliu-aws427b-ftphz-worker-us-east-2a-785cp   Running   m6i.xlarge   us-east-2   us-east-2a   112m
      huliu-aws427b-ftphz-worker-us-east-2a-h6tml   Running   m6i.xlarge   us-east-2   us-east-2a   112m
      huliu-aws427b-ftphz-worker-us-east-2b-zlxdr   Running   m6i.xlarge   us-east-2   us-east-2b   112m

      Actual results:

          the master machines are not balanced across multiple zones

      Expected results:

      the master machines should balance across multiple zones    

      Additional info:

          slack discusion: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1745821760072169
      must gather: https://drive.google.com/file/d/1GVHhCpuUiGNSMvLhLpf2CJffv5evjTjk/view?usp=sharing

              ddonati@redhat.com Damiano Donati
              huliu@redhat.com Huali Liu
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: