Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18883

CPMS failure domains should be omitted when a single failure domain is present

    • No
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-18113. The following is the description of the original issue:

      Description of problem:

      When the installer generates a CPMS, it should only add the `failureDomains` field when there is more than one failure domain. When there is only one failure domain, the fields from the failure domain, eg the zone, should be injected directly into the provider spec and the failure domain should be omitted.
      
      By doing this, we avoid having to care about failure domain injection logic for single zone clusters. Potentially avoiding bugs (such as some we have seen recently).
      
      IIRC we already did this for OpenStack, but AWS, Azure and GCP may not be affected.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Can be demonstrated on Azure on the westus region which has no AZs available. Currently the installer creates the following, which we can omit entirely:
      ```
      failureDomains:
        platform: Azure
        azure:
        - zone: ""
      ```

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

            [OCPBUGS-18883] CPMS failure domains should be omitted when a single failure domain is present

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:5006

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:5006

            Jinyun Ma added a comment -

            joelspeed  got it , thanks for the explanation!

            then there is no issue now, move bug to VERIFIED.

            Jinyun Ma added a comment - joelspeed   got it , thanks for the explanation! then there is no issue now, move bug to VERIFIED.

            Joel Speed added a comment -

            Hey, yes, the empty string for platform is what I mean when I say omitted. This makes the CPMS ignore the failure domain injection logic. If platform were Azure, then it would expect a list of azure zones, which we want to omit here, so this is as expected. If you can make a note on the test case that `platform: ""` is correct that would be good

            Joel Speed added a comment - Hey, yes, the empty string for platform is what I mean when I say omitted. This makes the CPMS ignore the failure domain injection logic. If platform were Azure, then it would expect a list of azure zones, which we want to omit here, so this is as expected. If you can make a note on the test case that `platform: ""` is correct that would be good

            Jinyun Ma added a comment -

            Missed another scenario needs to verified, and I have one doubt, so move bug back to ON_QA.

            Two scenarios tested:

            1. Install cluster on region northcentralus without AZs, installation succeeded, and no master machine is recreated.

            2. Install cluster with instance type `Standard_NP10s` that is only available in single zone (3), zone filed is directly set to machine provider spec instead of failureDomain. installation succeeded too, and no master machine is recreated.

            But I found in both scenarios, failureDomain is not omitted, but have below contents:

              spec:
                replicas: 3
                selector:
                  matchLabels:
                    machine.openshift.io/cluster-api-cluster: jima414-2s55r
                    machine.openshift.io/cluster-api-machine-role: master
                    machine.openshift.io/cluster-api-machine-type: master
                state: Active
                strategy:
                  type: RollingUpdate
                template:
                  machineType: machines_v1beta1_machine_openshift_io
                  machines_v1beta1_machine_openshift_io:
                    failureDomains:
                      platform: "" 

            I checked platform should be set to Azure in normal case, here is empty.

            joelspeed  could you help to check if this is expected? thanks.

            Jinyun Ma added a comment - Missed another scenario needs to verified, and I have one doubt, so move bug back to ON_QA. Two scenarios tested: 1. Install cluster on region northcentralus without AZs, installation succeeded, and no master machine is recreated. 2. Install cluster with instance type `Standard_NP10s` that is only available in single zone (3), zone filed is directly set to machine provider spec instead of failureDomain. installation succeeded too, and no master machine is recreated. But I found in both scenarios, failureDomain is not omitted, but have below contents: spec:     replicas: 3     selector:       matchLabels:         machine.openshift.io/cluster-api-cluster: jima414-2s55r         machine.openshift.io/cluster-api-machine-role: master         machine.openshift.io/cluster-api-machine-type: master     state: Active     strategy:       type: RollingUpdate     template:       machineType: machines_v1beta1_machine_openshift_io       machines_v1beta1_machine_openshift_io:         failureDomains:           platform: "" I checked platform should be set to Azure in normal case, here is empty. joelspeed   could you help to check if this is expected? thanks.

            Jinyun Ma added a comment -

            verified on 4.14.0-0.nightly-2023-09-15-233408, and passed, move bug to VERIFIED.

            Install cluster on region "northcentralus"  and on Azure stack hub, both succeeded. Master machine is not recreated any more.

            And master machine created by machine-api does not contain empty zone field.

            Jinyun Ma added a comment - verified on 4.14.0-0.nightly-2023-09-15-233408, and passed, move bug to VERIFIED. Install cluster on region "northcentralus"  and on Azure stack hub, both succeeded. Master machine is not recreated any more. And master machine created by machine-api does not contain empty zone field.

              Unassigned Unassigned
              openshift-crt-jira-prow OpenShift Prow Bot
              Jinyun Ma Jinyun Ma
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: