-
Bug
-
Resolution: Unresolved
-
Major
-
4.22
Description of problem
The GCP zone selection logic in the CI step registry (ipi-conf-gcp-commands.sh) only filters out AI zones but does not check whether the requested machine type is actually available in each zone. This causes ARM64 jobs using t2a-standard-4 to fail when us-central1-c is selected, since that zone does not offer t2a instances.
The installer correctly validates zone availability and rejects the install with:
controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4": instance type not available in zones: [us-central1-c]
The separate ipi-conf-gcp-zones-commands.sh step already has a get_zones_by_machine_type() function that handles this correctly, but it is not used by the main ipi-conf-gcp chain.
Version-Release number
4.22 (CI infrastructure, not version-specific)
How reproducible
Always (when us-central1-c is randomly selected as a zone, which happens frequently)
Steps to Reproduce
Run any GCP CI job that uses ARM64 control plane nodes (e.g. periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn)
The ipi-conf-gcp-commands.sh step selects zones from the region using get_zones_from_region, which only excludes AI zones
If us-central1-c is selected, the installer fails validation because t2a-standard-4 is not available there
Actual results
Install fails with:
controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4": instance type not available in zones: [us-central1-c]
The multi-a-a variant (all ARM64) has a 0/5 pass rate. The multi-x-ax variant (mixed) has 1/5 pass rate.
Expected results
Zone selection should only pick zones where the requested machine type is available. The install should proceed successfully.
Additional info
Affected job: periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn
Pass rate dropped from 100% to 40% (Sippy twoDay period as of 2026-03-06).
_Root cause file:_ ci-operator/step-registry/ipi/conf/gcp/ipi-conf-gcp-commands.sh in openshift/release
Fix: Replace get_zones_from_region with get_zones_for_machine_type that queries gcloud compute machine-types list to find zones where the specific machine type is available, with a fallback to the previous behavior. Also select zones independently for control plane and compute nodes since heterogeneous clusters may use different machine types available in different zones.
PR forthcoming.