-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The new master is created in another zone when deleting a master which caused these master machines are not balanced across multiple zones
Version-Release number of selected component (if applicable):
4.19.0-0.ci-2025-04-27-051101
How reproducible:
met twice
Steps to Reproduce:
1.Install a 4.19 AWS cluster sh-5.1$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.19.0-0.ci-2025-04-27-051101 True False 144m Cluster version is 4.19.0-0.ci-2025-04-27-051101 sh-5.1$ oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-j5wr5nh8-bae1c-8qb4c-master-0 Running m6i.xlarge us-east-1 us-east-1d 170m ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 170m ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 170m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 166m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 166m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 166m 2.Edit the controlplanemachineset, add machineNamePrefix, for example, machineNamePrefix: jason-born sh-5.1$ oc edit controlplanemachineset controlplanemachineset.machine.openshift.io/cluster edited 3.Delete master-0 to get the new name take effect, I see the new machine created in another zone sh-5.1$ oc delete machine ci-op-j5wr5nh8-bae1c-8qb4c-master-0 machine.machine.openshift.io "ci-op-j5wr5nh8-bae1c-8qb4c-master-0" deleted ^Csh-5.1$ oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-j5wr5nh8-bae1c-8qb4c-master-0 Deleting m6i.xlarge us-east-1 us-east-1d 173m ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 173m ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 173m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 170m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 170m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 170m jason-born-z8cx9-0 Provisioning m6i.xlarge us-east-1 us-east-1c 4s sh-5.1$ oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 3h26m ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 3h26m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 3h22m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 3h22m ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 3h22m jason-born-z8cx9-0 Running m6i.xlarge us-east-1 us-east-1c 32m So at this time, the master machines are not balanced across multiple zones. Because in the controlplanemachineset it's us-east-1c us-east-1d us-east-1a , but two masters in us-east-1c , one master in us-east-1a . failureDomains: aws: - placement: availabilityZone: us-east-1c subnet: id: subnet-024e2eedaa754b0ff type: ID - placement: availabilityZone: us-east-1d subnet: id: subnet-0424a84fcc457a01a type: ID - placement: availabilityZone: us-east-1a subnet: id: subnet-02b02e5d0afe52f86 type: ID platform: AWS The other time I met this is, there were only two failure domains in the control plane machine set. After adding machineNamePrefix, I deleted one master and the new master created in another zone. At this time, all the three masters were in one zone. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 100m huliu-aws427b-ftphz-master-1 Running m6i.xlarge us-east-2 us-east-2b 100m huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 100m huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 96m huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 96m huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 96m liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-aws427b-ftphz-master-1 machine.machine.openshift.io "huliu-aws427b-ftphz-master-1" deleted ^C liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE deep-seek.example.com-bf5z4-1 Provisioning m6i.xlarge us-east-2 us-east-2a 10s huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 100m huliu-aws427b-ftphz-master-1 Deleting m6i.xlarge us-east-2 us-east-2b 100m huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 100m huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 96m huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 96m huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 96m liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE deep-seek.example.com-bf5z4-1 Running m6i.xlarge us-east-2 us-east-2a 15m huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 116m huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 116m huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 112m huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 112m huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 112m
Actual results:
the master machines are not balanced across multiple zones
Expected results:
the master machines should balance across multiple zones
Additional info:
slack discusion: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1745821760072169 must gather: https://drive.google.com/file/d/1GVHhCpuUiGNSMvLhLpf2CJffv5evjTjk/view?usp=sharing