Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77944

GCP ARM64 CI jobs fail because zone selection does not filter by machine type availability

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      The GCP zone selection logic in the CI step registry (ipi-conf-gcp-commands.sh) only filters out AI zones but does not check whether the requested machine type is actually available in each zone. This causes ARM64 jobs using t2a-standard-4 to fail when us-central1-c is selected, since that zone does not offer t2a instances.

      The installer correctly validates zone availability and rejects the install with:

      controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4": instance type not available in zones: [us-central1-c]
      

      The separate ipi-conf-gcp-zones-commands.sh step already has a get_zones_by_machine_type() function that handles this correctly, but it is not used by the main ipi-conf-gcp chain.

      Version-Release number

      4.22 (CI infrastructure, not version-specific)

      How reproducible

      Always (when us-central1-c is randomly selected as a zone, which happens frequently)

      Steps to Reproduce

      Run any GCP CI job that uses ARM64 control plane nodes (e.g. periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn)

      The ipi-conf-gcp-commands.sh step selects zones from the region using get_zones_from_region, which only excludes AI zones

      If us-central1-c is selected, the installer fails validation because t2a-standard-4 is not available there

      Actual results

      Install fails with:

      controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4": instance type not available in zones: [us-central1-c]
      

      The multi-a-a variant (all ARM64) has a 0/5 pass rate. The multi-x-ax variant (mixed) has 1/5 pass rate.

      Expected results

      Zone selection should only pick zones where the requested machine type is available. The install should proceed successfully.

      Additional info

      Affected job: periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn
      Pass rate dropped from 100% to 40% (Sippy twoDay period as of 2026-03-06).

      _Root cause file:_ ci-operator/step-registry/ipi/conf/gcp/ipi-conf-gcp-commands.sh in openshift/release

      Fix: Replace get_zones_from_region with get_zones_for_machine_type that queries gcloud compute machine-types list to find zones where the specific machine type is available, with a fallback to the previous behavior. Also select zones independently for control plane and compute nodes since heterogeneous clusters may use different machine types available in different zones.

      PR forthcoming.

              nmoraiti Nikolaos Moraitis
              openshift-trt-privileged Technical Release Team Openshift
              Nikolaos Moraitis Nikolaos Moraitis
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: