XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: OpenShift 4.11
Component/s: PM Monitoring
Labels:

Blocked:
False
Ready:
False
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:
Undefined

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Market:

Problem Alignment

The Problem

Not every workload uses CPU and Memory to determine if they need to be scaled up or down. For example, a webapp exposing an HTTP API needs to scale based on incoming traffic (number of HTTP requests).

Autoscaling is a key feature of Kubernetes. There are different options you can use to scale either your workload (either the number of replicas or the resources attached to it) or the number of nodes in your cluster. With this feature, we will focus on scaling workload by either increasing or decreasing the number of replicas aka Horizontal Pod Autoscaler.

The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The controller manager obtains the metrics from either the resource metrics API (CPU and Memory), or the custom metrics API (for all other metrics). Currently, OpenShift only exposes an implementation for the resource metrics API which is available out-of-the-box with a standard OpenShift installation. It does not provide an implementation for the custom metrics API and therefore, customers can’t use any other metric to successfully operate a system of reliable, high SLA type of applications with minimal to non downtime during, for example, peak times.

High-Level Approach

Provide an installable component that implements the custom metrics API which exposes ANY metric collected by our OpenShift Monitoring system.

Goal & Success

Increase the value for running workloads on OpenShift by providing tools to allow workloads to automatically scale based on agreed parameters to keep them up at all times.
Number of HPA objects that use a custom metric exposed by our Monitoring stack should grow.

Solution Alignment

Key Capabilities

A user must be able to configure a new HorizontalPodAutoscaler object using a metric other than CPU and Memory.

Key Flows

Imagine a company that runs a Shop web application on OpenShift. During Christmas, they usually see a huge spike on their “Buy” API and they’ve been struggling to fulfill all requests, ending up even dropping connections which basically means they quickly lose revenue. That company would like to use Horizontal Pod Autoscaler to scale up/down based on the throughput. Their threshold at the moment is around 20 requests per second.

CLI Flow:

OpenShift Monitoring already collects HTTP specific metrics via HAproxy out-of-the-box.
Specify a query that will expose “throughput” as a single metric to HPA.
Configure an HPA object that uses the custom metric from step 2.

UI Flow:

In the developer perspective, go to the “Buy” API deployment.
Click on the “Configure autoscaling” button.
In the new popup, users would need to set up everything necessary for HPA such as the metrics they’d like to use (the ones available to the user) and a threshold.

Open Questions & Key Decisions (optional)

Do we foresee use cases where we do not have a Prometheus store locally and need to pull metrics from a remote storage (e.g. HyperShift)?
How do we want to provide the component that exposes the custom metrics API? Should that be either 1) installed with OpenShift Monitoring or 2) an admin must install and configure it.
Will we be limited on how many metrics that component can expose before there is a huge impact on the Prometheus cluster? If yes, how do we want to expose configuration to limit the required metrics exposed?
During my competitor analysis (see respective section in this doc) I found out that no one really provides the ability for users to define a “custom metric” based on a query instead of a plain metric available in the metrics store. Is that on purpose? For example, in the key flow, we highlighted “throughput” as the scale parameter. As per this description, “Throughput” is defined as a PromQL that contains metrics available via HAproxy. I also saw that the Prometheus Adapter provides a field where you can define a “metricsQuery” and expose the value as another metric. Is that what we want and if so, how do we expose something like that across tenants?
Do metrics have to be of a specific type (e.g. Gauge) or does HPA not really care about that?
Do we expect to support scaling workload outside an OpenShift cluster?
How do we let users configure the custom metrics they need particularly since you need to select the metrics for your specific pod.
Clarify HPA in HyperShift.

incorporates

RFE-2116 Allow autoscaling via custom metrics

Accepted

is related to

OCPSTRAT-484 Custom Metric Autoscaler (CMA)

Closed

Assignee:: Roger Florén

Reporter:: Christian Heidenreich (Inactive)

Votes:: 9 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 2021/06/22 1:53 PM

Updated:: 2024/12/20 11:22 PM

Resolved:: 2022/09/12 7:25 AM

Details

Description

Problem Alignment

The Problem

High-Level Approach

Goal & Success

Solution Alignment

Key Capabilities

Key Flows

Open Questions & Key Decisions (optional)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates