Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2935

Support for referencing failure domains directly in machineSet definitions for vSphere

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview

      This feature introduces the ability for MachineSets to reference predefined failureDomains from the Infrastructure Custom Resource (CR) by name. Currently, administrators must manually duplicate infrastructure details (datacenter, datastore, cluster, network, and resourcePool) within every MachineSet definition. By allowing a direct reference (e.g., failureDomain: zone-a), we reduce configuration sprawl, minimize manual entry errors, and ensure consistency across multi-zone high-availability (HA) deployments on VMware vSphere.

      Goals

      • Operational Efficiency: Enable users to manage infrastructure parameters in a single location (the Infrastructure CR) rather than across dozens of MachineSets.
      • Persona: Primary user is the Cluster Administrator responsible for scaling and maintaining multi-zone OpenShift clusters.
      • Improved Consistency: Eliminate "configuration drift" where individual MachineSets in the same logical zone accidentally point to different resources due to manual typos.
      • Standardization: Align the MachineSet experience with the Control Plane Machine Set (CPMS), which already utilizes failure domains for high availability.

      Requirements

      • Functional:
        • The MachineSet API must be extended to include a failureDomain reference field for vSphere.
        • The system must resolve the named failure domain to its constituent infrastructure parameters (datacenter, computeCluster, resourcePool, datastore, networks, etc.) during machine provisioning.
        • The feature must support all installation types, including Installer-Provisioned Infrastructure (IPI) and User-Provisioned Infrastructure (UPI), provided the Infrastructure CR is correctly populated.
      • Technical Architecture:
        • Implementation should prioritize Cluster API (CAPI) via the cluster-api-provider-vsphere if available in the target version.
        • If CAPI is not yet the default for the platform version, the functionality must be implemented in the Machine API (MAPI) machine-api-provider-vsphere with a clear migration path to CAPI.
      • Non-Functional:
        • Backward Compatibility: Existing MachineSets with explicitly defined infrastructure fields must continue to function without modification.
        • Reliability: The provider must handle cases where a referenced failure domain is missing or renamed, providing clear error status in the MachineSet conditions.

      Use Case

      Scenario: Scalable Multi-Zone Management

      "As a Cluster Administrator, I want to create a new MachineSet by simply referencing zone-beta so that I don't have to look up and copy-paste the specific vCenter folder paths, datastore names, and network IDs for that specific rack."

      Questions to Answer (Engineering/Design)

      • Precedence: If a user defines both a failureDomain reference AND an explicit datastore in the same MachineSet, which takes priority?
      • Validation: Should an admission webhook be implemented to reject MachineSets that reference non-existent failure domains?
      • CAPI Alignment: How does this map to the VSphereMachineTemplate in the upstream Cluster API provider?

      Out of Scope

      •  

      Links

      •  

              mzasepa Michal Zasepa
              mzasepa Michal Zasepa
              None
              Joseph Callen, Neil Girard, Richard Vanderpool
              None
              Penghao Wang Penghao Wang
              Avani Bhatt Avani Bhatt
              Eric Rich Eric Rich
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: