Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-3611

Provide fallback or priorization for MachineSet/MachinePools to guarantee scale-up in case instance type is not available

    XMLWordPrintable

Details

    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

    Description

      1. Proposed title of this feature request
      Provide fallback or priorization for MachineSet/MachinePools to guarantee scale-up in case instance type is not available

      2. What is the nature and description of the request?
      In various Public Cloud Region it's common to see MachineSet scaling failures because the requested instanceType is not available at the point the scaling was triggered. This is causing critical workload to remain in pending state as simply no resources are available to host the workload.

      Thus having a way to either fallback to a MachineSet/MachinePool with a different instanceType specified or go through a list of MachineSet/MachinePool based priorities would be desired to automatically recover from a faulty Machine scale-up because the instanceType is not available.

      If something can be provided today in the MachineAPI that would be great. The focus though should be on the ClusterAPI to make sure this functionality is available once OpenShift Container Platform 4 is transitioning to the same.

      3. Why does the customer need this? (List the business requirements here)
      Being able to scale on demand is critical for customers. Having the scale-up stall because the instanceType is not available is causing potential disruption in production and also requires manual intervention from the SRE Team. Given that MachineSet/MachinePools with different instanceType can be created, it would be rather easy to assign priorites to them or simply implement a fallback in case the scaling is failing.

      So key would be to have a way to tell ClusterAPI to use a different MachinePool or instanceType if the selected one is not available and thus to actually capture that event properly to act accordingly.

      4. List any affected packages or components.

      • MachineAPI
      • ClusterAPI

      Attachments

        Issue Links

          Activity

            People

              rh-ee-smodeel Subin MM
              rhn-support-sreber Simon Reber
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: