Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45663

[azure] Worker machines get Failed state if region has no availability zones or availability set fault domains

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In Azure, there are 2 regions that don't have availability zones or availability set fault domains (centraluseuap, eastusstg). They are test regions, one of which is in-use by the ARO team.
      
      Machine API provider seems to be hardcoding an availability set fault domain count of 2 in creation of the machineset: https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/services/availabilitysets/availabilitysets.go#L32, so if there is not at least a fault domain count of 2 in the target region, the install will fail because worker nodes get a Failed status.
      
      This is the error from Azure, reported by the machine API:
      
      `The specified fault domain count 2 must fall in the range 1 to 1.`
      
      Because of this, the regions are not able to support OCP clusters.

      Version-Release number of selected component (if applicable):

          Observed in 4.15

      How reproducible:

          Very

      Steps to Reproduce:

          1. Attempt creation of an OCP cluster in centraluseuap or eastusstg regions
          2. Observe worker machine failures
          

      Actual results:

          Worker machines get a failed state

      Expected results:

          Worker machines are able to start. I am guessing that this would happen via dynamic setting of the availability set fault domain count rather than hardcoding it to 2, which right now just happens to work in most regions in Azure because the fault domain counts are typically at least 2.
      
      In upstream, it looks like we're dynamically setting this by querying the amount of fault domains in a region: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/40f0fabc264388de02a88de7fbe400c21d22e7e2/azure/services/availabilitysets/spec.go#L70

      Additional info:

          

              rhn-support-zhsun Zhaohua Sun
              rhn-support-cmarches Caden Marchese
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: