Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-5043

Energy saving functionality on Baremetal clusters for Telco Cloud RAN environments

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Done
    • Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

    Description

      1. Proposed title of this feature request

      Energy saving functionality on Baremetal clusters for Telco Cloud RAN environments

      2. What is the nature and description of the request?

       

      Scope

      The scope of the Energy Saving functionality is to reduce the energy consumption of an MNO cluster, which will be used for running either CU or DU+CU workload, by powering-off compute (worker) nodes when available resources are significantly exceeding the demand for an extended period of time. Compute nodes that have been powered off will exclusively remain allocated to the initially provisioned cluster and won't be utilized by any other cluster.

       

      Current Functionality provided by Cluster Autoscaler

      The current implementation of the Cluster Autoscaler, as it has been presented to the partner, manages the cluster's Machine Autoscaler, which in turn manages the MachineSet by increasing or decreasing the number of machines accordingly in the MachineSet.

       

      Scaling-In

      Every 10 seconds, the Cluster Autoscaler checks which nodes are unnecessary in the cluster and removes them. The cluster autoscaler considers a node for removal if the following conditions apply:

      • The node utilization is less than the node utilization level threshold for the cluster. The node utilization level is the sum of the requested resources divided by the allocated resources for the node. If you do not specify a value in the ClusterAutoscaler custom resource, the cluster autoscaler uses a default value of 0.5, which corresponds to 50% utilization.

       

      • The cluster autoscaler can move all pods running on the node to the other nodes. The Kubernetes scheduler is responsible for scheduling pods on the nodes.

       

      • The cluster autoscaler does not have scale down disabled annotation.

       

      When scaling-in the cluster, the Machine object of the compute node gets deleted, resulting in deprovisioning the node from the cluster and powering-off the machine.

       

      Scaling-out

      The Cluster Autoscaler increases the size of the cluster when there are pods that fail to schedule on any of the current worker nodes due to insufficient resources or when another node is necessary to meet deployment needs. 

      When scaling-out the cluster, the compute node is installed from scratch.

       

      Challenges / Gaps

      • The Cloud RAN solution prohibits the cluster from accessing the out-of-band management system (BMC) on any of its nodes.
      • As stated earlier, the Cluster Autoscaler increases the size only if there are pods that fail to schedule on any of the current worker nodes or when another node is necessary to meet deployment needs. This, combined with the practice of provisioning a node from scratch when scaling-out a cluster, may lead to partial outage for a significant amount of time.
      • The ClusterAutoscaler should be able to control multiple MachineAutoscaler objects, each defining a MachineSet object with Machines with different labels and/or hardware specs.

      Desired Functionality - Requirements

      • A new Operator should be created, since nodes are only powered-off and not deprovisioned (thus not scaling-in and out).
      • All objects related to the nodes must be retained, including Machine objects.
      • In Telco environments, spoke/managed clusters do not have access to their BMC. Due to this restriction, any actions performed by the new Operator must be orchestrated by the Hub cluster.
      • Each spoke cluster will have 1 or more MachineSet objects. Each MachineSet will be used for grouping nodes that are destined to run specific types of workload.

       

      Powering-On nodes

      • A pod will request a specific number of resources. Optionally, the pod will have affinity rules defined, so that it is scheduled on a Machine within a specific MachineSet. When a pod cannot be scheduled due to lack of resources of the specific MachineSet, the new Operator should initiate the process powering-on a node that was previously powered-off for energy saving.
      • A pod that requests specific resources but does not define any node affinity should be scheduled on any of the cluster’s MachineSets that have sufficient resources.
      • Nodes that have been powered-off for reasons other than for energy saving, should be ignored and should not be included in the list of nodes that could potentially be powered-on to meet the demand.
      • The number of nodes that should be powered-on should meet the demand of pending (due to lack of resources) pods. In case the amount of resources requested by the pending pods exceed the capacity of a node that is in the process of  powering-on, then the new Operator should initiate the process of powering-on additional nodes until it meets the demand.

       

      Powering-off nodes

      • Nodes should be powered-off and remain provisioned, as opposed to the current implementation of the Cluster Autoscaler, which deprovisions the nodes.
      • Nodes below the utilization level defined in the new Operator should be eligible to be powered off.

       

      Other considerations

      • Any day-2 operations that affect nodes which have been powered-off for energy saving, should initiate the process of powering-on all nodes and retain that state until the operation is completed. The following list is an example of day-2 operations (this list is not exhaustive) :
      • Any new configuration applied through ZTP/GitOps, including cluster updates.
      • Auto-rotation of the node certificates.
      • For the above use case, a new field could be introduced to temporarily suspend the functionality of the new Operator.
      • Alerts for pending pods should not be fired while there are nodes available to be powered-on and/or nodes are powered-on and getting ready.
      • Alerts should be fired if a pod is pending due to lack of resources and there are no other nodes available.

      3. Why does the customer need this? (List the business requirements here)

      Conserving energy and improving energy efficiency are among the key priorities for telco partners.

       

      4. List any affected packages or components.

      OpenShift

      ACM/GitOps/ZTP

      Metal3

      Attachments

        Activity

          People

            hrakotor@redhat.com Hari Rakotoranto
            dvassili@redhat.com Demetris Vassiliades
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: