Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-73785

[AWS] Control plane machines are created in wrong zone

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Proposed
    • Bug Fix
    • Hide
      Previously, when installing on AWS where the installer provisions the VPC, a potential mismatch could occur in the subnet information in the AWS Availability Zone between the machine set custom resources for control plane nodes and their corresponding EC2 instances. As a consequence, where the control plane nodes were spread across three Availability Zones and one was recreated the discrepancy could result in an unbalanced control plane as two nodes occurred within the same Availability Zone. With this release, it is ensured that the subnet Availability Zone information in the machine set custom resources and in the EC2 instances match and the issue is resolved.
      Show
      Previously, when installing on AWS where the installer provisions the VPC, a potential mismatch could occur in the subnet information in the AWS Availability Zone between the machine set custom resources for control plane nodes and their corresponding EC2 instances. As a consequence, where the control plane nodes were spread across three Availability Zones and one was recreated the discrepancy could result in an unbalanced control plane as two nodes occurred within the same Availability Zone. With this release, it is ensured that the subnet Availability Zone information in the machine set custom resources and in the EC2 instances match and the issue is resolved.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-73773. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-69923. The following is the description of the original issue:

      Description of problem:

          Control plane machines are created in wrong zone

      Version-Release number of selected component (if applicable):

          4.21.0-0.nightly-2025-12-15-125449

      How reproducible:

      not always, but seems high probability
      I've observed several inconsistent clusters, but I've also seen consistent clusters.

      Steps to Reproduce:

          1.Install an AWS cluster, we use automated template:  ipi-on-aws/versioned-installer-ci  
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.21.0-0.nightly-2025-12-15-125449   True        False         4h21m   Cluster version is 4.21.0-0.nightly-2025-12-15-125449
      
          2.Check the control plane machines, found they are created in wrong zones. For example, master-0's availabilityZone and subnet is for us-east-2a but it is created in us-east-2c
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
      Now using project "openshift-machine-api" on server "https://api.hongli-aws421.qe.devcluster.openshift.com:6443".
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                          PHASE     TYPE         REGION      ZONE         AGE
      hongli-aws421-p4vc5-master-0                  Running   m6i.xlarge   us-east-2   us-east-2c   4h49m
      hongli-aws421-p4vc5-master-1                  Running   m6i.xlarge   us-east-2   us-east-2a   4h49m
      hongli-aws421-p4vc5-master-2                  Running   m6i.xlarge   us-east-2   us-east-2b   4h49m
      hongli-aws421-p4vc5-worker-us-east-2a-wgk22   Running   m6i.xlarge   us-east-2   us-east-2a   4h45m
      hongli-aws421-p4vc5-worker-us-east-2b-vxgzw   Running   m6i.xlarge   us-east-2   us-east-2b   4h45m
      hongli-aws421-p4vc5-worker-us-east-2c-9hg7x   Running   m6i.xlarge   us-east-2   us-east-2c   4h45m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine hongli-aws421-p4vc5-master-0  -oyaml
      apiVersion: machine.openshift.io/v1beta1
      kind: Machine
      metadata:
        annotations:
          machine.openshift.io/instance-state: running
        creationTimestamp: "2025-12-18T03:56:56Z"
        finalizers:
        - machine.machine.openshift.io
        generation: 3
        labels:
          machine.openshift.io/cluster-api-cluster: hongli-aws421-p4vc5
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
          machine.openshift.io/instance-type: m6i.xlarge
          machine.openshift.io/region: us-east-2
          machine.openshift.io/zone: us-east-2c
        name: hongli-aws421-p4vc5-master-0
        namespace: openshift-machine-api
        ownerReferences:
        - apiVersion: machine.openshift.io/v1
          blockOwnerDeletion: true
          controller: true
          kind: ControlPlaneMachineSet
          name: cluster
          uid: 537381af-16e6-4d8f-aa7e-006e307abaf3
        resourceVersion: "11832"
        uid: 0566c424-d379-4d4e-9166-65e4574bff82
      spec:
        lifecycleHooks:
          preDrain:
          - name: EtcdQuorumOperator
            owner: clusteroperator/etcd
        metadata: {}
        providerID: aws:///us-east-2c/i-08a0072714a14259f
        providerSpec:
          value:
            ami:
              id: ami-0bc8dda494f111572
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:
                encrypted: true
                iops: 0
                kmsKey:
                  arn: ""
                volumeSize: 120
                volumeType: gp3
            capacityReservationId: ""
            credentialsSecret:
              name: aws-cloud-credentials
            deviceIndex: 0
            iamInstanceProfile:
              id: hongli-aws421-p4vc5-master-profile
            instanceType: m6i.xlarge
            kind: AWSMachineProviderConfig
            loadBalancers:
            - name: hongli-aws421-p4vc5-int
              type: network
            - name: hongli-aws421-p4vc5-ext
              type: network
            metadata: {}
            metadataServiceOptions: {}
            placement:
              availabilityZone: us-east-2a
              region: us-east-2
            securityGroups:
            - filters:
              - name: tag:Name
                values:
                - hongli-aws421-p4vc5-node
            - filters:
              - name: tag:Name
                values:
                - hongli-aws421-p4vc5-lb
            - filters:
              - name: tag:Name
                values:
                - hongli-aws421-p4vc5-controlplane
            subnet:
              filters:
              - name: tag:Name
                values:
                - hongli-aws421-p4vc5-subnet-private-us-east-2a
            tags:
            - name: kubernetes.io/cluster/hongli-aws421-p4vc5
              value: owned
            userDataSecret:
              name: master-user-data
      status:
        addresses:
        - address: 10.0.75.197
          type: InternalIP
        - address: ip-10-0-75-197.us-east-2.compute.internal
          type: InternalDNS
        - address: ip-10-0-75-197.us-east-2.compute.internal
          type: Hostname
        conditions:
        - lastTransitionTime: "2025-12-18T04:01:06Z"
          message: 'Drain operation currently blocked by: [{Name:EtcdQuorumOperator Owner:clusteroperator/etcd}]'
          reason: HookPresent
          severity: Warning
          status: "False"
          type: Drainable
        - lastTransitionTime: "2025-12-18T04:01:06Z"
          status: "True"
          type: InstanceExists
        - lastTransitionTime: "2025-12-18T04:00:59Z"
          status: "True"
          type: Terminable
        lastUpdated: "2025-12-18T04:01:06Z"
        nodeRef:
          kind: Node
          name: ip-10-0-75-197.us-east-2.compute.internal
          uid: 3a8d34ee-298e-4ecc-a848-556e643fabab
        phase: Running
        providerStatus:
          conditions:
          - lastTransitionTime: "2025-12-18T04:01:06Z"
            message: Machine successfully created
            reason: MachineCreationSucceeded
            status: "True"
            type: MachineCreation
          instanceId: i-08a0072714a14259f
          instanceState: running
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine -n openshift-machine-api -oyaml |grep us-east-
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2c
          providerID: aws:///us-east-2c/i-08a0072714a14259f
                availabilityZone: us-east-2a
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2a
          - address: ip-10-0-75-197.us-east-2.compute.internal
          - address: ip-10-0-75-197.us-east-2.compute.internal
            name: ip-10-0-75-197.us-east-2.compute.internal
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2a
          providerID: aws:///us-east-2a/i-0e3030c3fff419730
                availabilityZone: us-east-2b
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2b
          - address: ip-10-0-3-44.us-east-2.compute.internal
          - address: ip-10-0-3-44.us-east-2.compute.internal
            name: ip-10-0-3-44.us-east-2.compute.internal
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2b
          providerID: aws:///us-east-2b/i-0ddc50a52466d6f07
                availabilityZone: us-east-2c
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2c
          - address: ip-10-0-35-213.us-east-2.compute.internal
          - address: ip-10-0-35-213.us-east-2.compute.internal
            name: ip-10-0-35-213.us-east-2.compute.internal
          generateName: hongli-aws421-p4vc5-worker-us-east-2a-
            machine.openshift.io/cluster-api-machineset: hongli-aws421-p4vc5-worker-us-east-2a
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2a
          name: hongli-aws421-p4vc5-worker-us-east-2a-wgk22
            name: hongli-aws421-p4vc5-worker-us-east-2a
          providerID: aws:///us-east-2a/i-0814e9fe0136f5ec6
                availabilityZone: us-east-2a
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2a
          - address: ip-10-0-9-163.us-east-2.compute.internal
          - address: ip-10-0-9-163.us-east-2.compute.internal
            name: ip-10-0-9-163.us-east-2.compute.internal
          generateName: hongli-aws421-p4vc5-worker-us-east-2b-
            machine.openshift.io/cluster-api-machineset: hongli-aws421-p4vc5-worker-us-east-2b
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2b
          name: hongli-aws421-p4vc5-worker-us-east-2b-vxgzw
            name: hongli-aws421-p4vc5-worker-us-east-2b
          providerID: aws:///us-east-2b/i-062ae0c8a6aa8d45a
                availabilityZone: us-east-2b
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2b
          - address: ip-10-0-35-47.us-east-2.compute.internal
          - address: ip-10-0-35-47.us-east-2.compute.internal
            name: ip-10-0-35-47.us-east-2.compute.internal
          generateName: hongli-aws421-p4vc5-worker-us-east-2c-
            machine.openshift.io/cluster-api-machineset: hongli-aws421-p4vc5-worker-us-east-2c
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2c
          name: hongli-aws421-p4vc5-worker-us-east-2c-9hg7x
            name: hongli-aws421-p4vc5-worker-us-east-2c
          providerID: aws:///us-east-2c/i-0d8423d7d0820cd24
                availabilityZone: us-east-2c
                region: us-east-2
                  - hongli-aws421-p4vc5-subnet-private-us-east-2c
          - address: ip-10-0-77-46.us-east-2.compute.internal
          - address: ip-10-0-77-46.us-east-2.compute.internal
            name: ip-10-0-77-46.us-east-2.compute.internal 
      
          

      Actual results:

          control plane machines created in wrong zones

      Expected results:

          control plane machines should be created in the availabilityZone and subnet specified in the machine spec

      Additional info:

      slack discussion: https://redhat-internal.slack.com/archives/CF8SMALS1/p1766025295990739?thread_ts=1745837739.587219&cid=CF8SMALS1    

              rh-ee-thvo Thuan Vo
              huliu@redhat.com Huali Liu
              None
              None
              Weinan Li Weinan Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: