-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
add max_count_of_nodes_in_pool metric
2. What is the nature and description of the request?
Currently the following metrics are available:
cluster_autoscaler_max_nodes_count: max number of nodes for cluster autoscaler
mco_updated_machine_count
: current number of nodes for node pools
mapi_machine_set_status_replicas: ^ similar results
3. Why does the customer need this? (List the business requirements here)
Currently we are unable to alert on node pools reaching maximum capacity which prevents workloads from scaling if not remediated. Ideally we can have a metric to monitor and proactively increase this.
4. List any affected packages or components.
Openshift Virtualization 4.16 (specific to prometheus)
Ref Salesforce Case: 04189510
For visual reference;
❯ k get machineautoscalers
NAME REF KIND REF NAME MIN MAX AGE
<>-98zqs-infra-a MachineSet <>-98zqs-infra-a 1 3 215d
<>-98zqs-infra-b MachineSet <>-98zqs-infra-b 1 3 215d
<>-98zqs-infra-c MachineSet <>-98zqs-infra-c 1 3 215d
<>-98zqs-worker-a MachineSet <>-98zqs-worker-a 1 3 215d
<>-98zqs-worker-b MachineSet <>-98zqs-worker-b 1 3 215d
<>-98zqs-worker-c MachineSet <>-98zqs-worker-c 1 3 215d
promql: sum(mapi_machine_set_status_replicas
{name=~".*-worker-.*"})
result: 3
promql: sum(mco_updated_machine_count
)
result: 3
promql: sum(cluster_autoscaler_max_nodes_count)
result: 18
promql: sum(mco_updated_machine_count
)
result: 8