-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The new master is created in another zone when deleting a master which caused these master machines are not balanced across multiple zones
Version-Release number of selected component (if applicable):
4.19.0-0.ci-2025-04-27-051101
How reproducible:
met twice
Steps to Reproduce:
1.Install a 4.19 AWS cluster
sh-5.1$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.19.0-0.ci-2025-04-27-051101 True False 144m Cluster version is 4.19.0-0.ci-2025-04-27-051101
sh-5.1$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
ci-op-j5wr5nh8-bae1c-8qb4c-master-0 Running m6i.xlarge us-east-1 us-east-1d 170m
ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 170m
ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 170m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 166m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 166m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 166m
2.Edit the controlplanemachineset, add machineNamePrefix, for example, machineNamePrefix: jason-born
sh-5.1$ oc edit controlplanemachineset
controlplanemachineset.machine.openshift.io/cluster edited
3.Delete master-0 to get the new name take effect, I see the new machine created in another zone
sh-5.1$ oc delete machine ci-op-j5wr5nh8-bae1c-8qb4c-master-0
machine.machine.openshift.io "ci-op-j5wr5nh8-bae1c-8qb4c-master-0" deleted
^Csh-5.1$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
ci-op-j5wr5nh8-bae1c-8qb4c-master-0 Deleting m6i.xlarge us-east-1 us-east-1d 173m
ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 173m
ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 173m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 170m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 170m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 170m
jason-born-z8cx9-0 Provisioning m6i.xlarge us-east-1 us-east-1c 4s
sh-5.1$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
ci-op-j5wr5nh8-bae1c-8qb4c-master-1 Running m6i.xlarge us-east-1 us-east-1a 3h26m
ci-op-j5wr5nh8-bae1c-8qb4c-master-2 Running m6i.xlarge us-east-1 us-east-1c 3h26m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1a-qt6h8 Running m5.xlarge us-east-1 us-east-1a 3h22m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1c-c2vxx Running m5.xlarge us-east-1 us-east-1c 3h22m
ci-op-j5wr5nh8-bae1c-8qb4c-worker-us-east-1d-zxzpp Running m5.xlarge us-east-1 us-east-1d 3h22m
jason-born-z8cx9-0 Running m6i.xlarge us-east-1 us-east-1c 32m
So at this time, the master machines are not balanced across multiple zones. Because in the controlplanemachineset it's us-east-1c us-east-1d us-east-1a , but two masters in us-east-1c , one master in us-east-1a .
failureDomains:
aws:
- placement:
availabilityZone: us-east-1c
subnet:
id: subnet-024e2eedaa754b0ff
type: ID
- placement:
availabilityZone: us-east-1d
subnet:
id: subnet-0424a84fcc457a01a
type: ID
- placement:
availabilityZone: us-east-1a
subnet:
id: subnet-02b02e5d0afe52f86
type: ID
platform: AWS
The other time I met this is, there were only two failure domains in the control plane machine set. After adding machineNamePrefix, I deleted one master and the new master created in another zone. At this time, all the three masters were in one zone.
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 100m
huliu-aws427b-ftphz-master-1 Running m6i.xlarge us-east-2 us-east-2b 100m
huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 100m
huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 96m
huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 96m
huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 96m
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-aws427b-ftphz-master-1
machine.machine.openshift.io "huliu-aws427b-ftphz-master-1" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
deep-seek.example.com-bf5z4-1 Provisioning m6i.xlarge us-east-2 us-east-2a 10s
huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 100m
huliu-aws427b-ftphz-master-1 Deleting m6i.xlarge us-east-2 us-east-2b 100m
huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 100m
huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 96m
huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 96m
huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 96m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
deep-seek.example.com-bf5z4-1 Running m6i.xlarge us-east-2 us-east-2a 15m
huliu-aws427b-ftphz-master-0 Running m6i.xlarge us-east-2 us-east-2a 116m
huliu-aws427b-ftphz-master-2 Running m6i.xlarge us-east-2 us-east-2a 116m
huliu-aws427b-ftphz-worker-us-east-2a-785cp Running m6i.xlarge us-east-2 us-east-2a 112m
huliu-aws427b-ftphz-worker-us-east-2a-h6tml Running m6i.xlarge us-east-2 us-east-2a 112m
huliu-aws427b-ftphz-worker-us-east-2b-zlxdr Running m6i.xlarge us-east-2 us-east-2b 112m
Actual results:
the master machines are not balanced across multiple zones
Expected results:
the master machines should balance across multiple zones
Additional info:
slack discusion: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1745821760072169
must gather: https://drive.google.com/file/d/1GVHhCpuUiGNSMvLhLpf2CJffv5evjTjk/view?usp=sharing