Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7880

Missing max_count_of_nodes_in_pool metric

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Autoscaling
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request
      add max_count_of_nodes_in_pool metric
      2. What is the nature and description of the request?
      Currently the following metrics are available:
      cluster_autoscaler_max_nodes_count: max number of nodes for cluster autoscaler
      mco_updated_machine_count

      {pool="<pool-name>"}

      : current number of nodes for node pools
      mapi_machine_set_status_replicas: ^ similar results

      3. Why does the customer need this? (List the business requirements here)
      Currently we are unable to alert on node pools reaching maximum capacity which prevents workloads from scaling if not remediated. Ideally we can have a metric to monitor and proactively increase this.

      4. List any affected packages or components.
      Openshift Virtualization 4.16 (specific to prometheus)

      Ref Salesforce Case: 04189510

      For visual reference;

      ❯ k get machineautoscalers
      NAME REF KIND REF NAME MIN MAX AGE
      <>-98zqs-infra-a MachineSet <>-98zqs-infra-a 1 3 215d
      <>-98zqs-infra-b MachineSet <>-98zqs-infra-b 1 3 215d
      <>-98zqs-infra-c MachineSet <>-98zqs-infra-c 1 3 215d
      <>-98zqs-worker-a MachineSet <>-98zqs-worker-a 1 3 215d
      <>-98zqs-worker-b MachineSet <>-98zqs-worker-b 1 3 215d
      <>-98zqs-worker-c MachineSet <>-98zqs-worker-c 1 3 215d

      promql: sum(mapi_machine_set_status_replicas

      {name=~".*-worker-.*"}

      )
      result: 3
      promql: sum(mco_updated_machine_count

      {pool="worker"}

      )
      result: 3
      promql: sum(cluster_autoscaler_max_nodes_count)
      result: 18
      promql: sum(mco_updated_machine_count

      {pool!="worker"}

      )
      result: 8

              rh-ee-smodeel Subin M
              dacarpen@redhat.com Darren Carpenter
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                None
                None