Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63729

CentralUSEAUP worker machine creation fails with error on platformUpdateDomainCount

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          In Azure, in CentralUSEUAP, when creating an OCP cluster (applies to ARO too), worker machine fail at being created. Looking at error, message, it seems that the underlying Availability Set creation fails with error 
      AvailabilitySet "<somethingsomething>" with platformFaultDomainCount = 1 can only support platformUpdateDomainCount = 1
      
      This error echoes some of the things in https://issues.redhat.com/browse/OCPBUGS-45663. The way I understand the MAPI code 
      https://github.com/openshift/machine-api-provider-azure/blob/5a6516188d4ec33734e1a069da2acc7a469657dc/pkg/cloud/azure/services/availabilitysets/availabilitysets.go#L48
      
      is that to fix OCPBUGS-45663, the platformFaultDomainCount is now computed dynamically to 1 for that special region. But the platformUpdateDomainCount is hardcoded to 5, which sounds to be incompatible with platformUpdateDomainCount set to 1 (apparently, Azure seems to expect platformUpdateDomainCount to be only 1 in that case). 

      Version-Release number of selected component (if applicable):

          observed 4.16, 4.17, 4.18

      How reproducible:

      systematic    

      Steps to Reproduce:

          1. Create an OCP cluster on Azure (or an ARO cluster) with any of the versions that contains the fix for https://issues.redhat.com/browse/OCPBUGS-45663  in CentralusEUAP
          2. Worker Machine creation fail.
          3.
          

      Actual results:

          MAPI does not create the underlying Worker VM, error appears about "AvailabilitySet "<somethingsomething>" with platformFaultDomainCount = 1 can only support platformUpdateDomainCount = 1"

      Expected results:

      Worker VM are created and machine goes running    

      Additional info:

          This error echoes some of the things in https://issues.redhat.com/browse/OCPBUGS-45663. The way I understand the MAPI code 
      https://github.com/openshift/machine-api-provider-azure/blob/5a6516188d4ec33734e1a069da2acc7a469657dc/pkg/cloud/azure/services/availabilitysets/availabilitysets.go#L48
      
      is that to fix OCPBUGS-45663, the platformFaultDomainCount is now computed dynamically to 1 for that special region. But the platformUpdateDomainCount is hardcoded to 5, which sounds to be incompatible with platformUpdateDomainCount set to 1 (apparently, Azure seems to expect platformUpdateDomainCount to be only 1 in that case). 
      
      I am not certain this is something that changed recently on Azure side or if the incompatibility between those two paramaters has always been there.

              rmanak@redhat.com Radek Manak
              gvanderp@redhat.com Ghislain VANDERPOTTE
              Christophe LACOMBE
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: