-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.15
-
Moderate
-
No
-
False
-
-
-
Bug Fix
-
Done
This is a clone of issue OCPBUGS-48659. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45663. The following is the description of the original issue:
—
Description of problem:
In Azure, there are 2 regions that don't have availability zones or availability set fault domains (centraluseuap, eastusstg). They are test regions, one of which is in-use by the ARO team. Machine API provider seems to be hardcoding an availability set fault domain count of 2 in creation of the machineset: https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/services/availabilitysets/availabilitysets.go#L32, so if there is not at least a fault domain count of 2 in the target region, the install will fail because worker nodes get a Failed status. This is the error from Azure, reported by the machine API: `The specified fault domain count 2 must fall in the range 1 to 1.` Because of this, the regions are not able to support OCP clusters.
Version-Release number of selected component (if applicable):
Observed in 4.15
How reproducible:
Very
Steps to Reproduce:
1. Attempt creation of an OCP cluster in centraluseuap or eastusstg regions 2. Observe worker machine failures
Actual results:
Worker machines get a failed state
Expected results:
Worker machines are able to start. I am guessing that this would happen via dynamic setting of the availability set fault domain count rather than hardcoding it to 2, which right now just happens to work in most regions in Azure because the fault domain counts are typically at least 2. In upstream, it looks like we're dynamically setting this by querying the amount of fault domains in a region: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/40f0fabc264388de02a88de7fbe400c21d22e7e2/azure/services/availabilitysets/spec.go#L70
Additional info:
- blocks
-
OCPBUGS-50966 [azure] Worker machines get Failed state if region has no availability zones or availability set fault domains
-
- POST
-
- clones
-
OCPBUGS-48659 [azure] Worker machines get Failed state if region has no availability zones or availability set fault domains
-
- Verified
-
- is blocked by
-
OCPBUGS-48659 [azure] Worker machines get Failed state if region has no availability zones or availability set fault domains
-
- Verified
-
- is cloned by
-
OCPBUGS-50966 [azure] Worker machines get Failed state if region has no availability zones or availability set fault domains
-
- POST
-
- links to
-
RHBA-2025:1403 OpenShift Container Platform 4.17.z bug fix update