Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-4227

Incorrect values for cpu requests as % of allocatable on ACM Observability

XMLWordPrintable

    • -
    • No

      There appears to be a bug in the RHACM observatorium dashboard with respect to "cpu requests as a percentage of allocatable cpu resource within a cluster"; our simple test showed the metric to be almost 20 times allocatable cpu resource, which is not possible.

      Please see the attached PDF document for details, must-gathers also attached.

      Also within the PDF, please see a locally created promql expression which attempts to give an accurate measure of "cpu requests" as a percentage of really available & allocatable cpu resource within a cluster.

      Custom promql:
      sum((kube_pod_container_resource_requests{resource="cpu"} * on (pod,namespace) group_left (phase) kube_pod_status_phase{phase="Running"}) * on (node) group_left (role) kube_node_role{role="app"} ) / sum(kube_node_status_allocatable{resource="cpu"} * on (node) group_left(role) kube_node_role{role="app"})

      The expression attempts to be more accurate by:

      • Aggregating cpu requests, kube_pod_container_resource_requests{resource="cpu"} , for pods in "Running" state only & for pods running on "APP" nodes.
      • The cpu allocatable resource is also modified by only inclusing resources from "app" worker nodes.

      We would like a query which is accurate for our scenario and can be used for alerting as well as capacity management; I have compared this with other metrics available from common dashboards on RHACM or the OCP UI.

      This is a key metric for operational stability and capacity management for operating our fleet of clusters, so looking for guidance.

            smeduri1@redhat.com Subbarao Meduri
            rhn-support-jayoung James Young
            Xiang Yin Xiang Yin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: