-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.15
-
Moderate
-
No
-
False
-
Description of problem:
In Azure, there are 2 regions that don't have availability zones or availability set fault domains (centraluseuap, eastusstg). They are test regions, one of which is in-use by the ARO team. Machine API provider seems to be hardcoding an availability set fault domain count of 2 in creation of the machineset: https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/services/availabilitysets/availabilitysets.go#L32, so if there is not at least a fault domain count of 2 in the target region, the install will fail because worker nodes get a Failed status. This is the error from Azure, reported by the machine API: `The specified fault domain count 2 must fall in the range 1 to 1.` Because of this, the regions are not able to support OCP clusters.
Version-Release number of selected component (if applicable):
Observed in 4.15
How reproducible:
Very
Steps to Reproduce:
1. Attempt creation of an OCP cluster in centraluseuap or eastusstg regions 2. Observe worker machine failures
Actual results:
Worker machines get a failed state
Expected results:
Worker machines are able to start. I am guessing that this would happen via dynamic setting of the availability set fault domain count rather than hardcoding it to 2, which right now just happens to work in most regions in Azure because the fault domain counts are typically at least 2. In upstream, it looks like we're dynamically setting this by querying the amount of fault domains in a region: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/40f0fabc264388de02a88de7fbe400c21d22e7e2/azure/services/availabilitysets/spec.go#L70
Additional info: