Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35095

`KubeCPUOvercommit` Alert Not Triggered Despite Node CPU is Overcommitment

XMLWordPrintable

    • Moderate
    • No
    • 2
    • MON Sprint 256, MON Sprint 260, MON Sprint 261
    • 3
    • False
    • Hide

      None

      Show
      None

      One of our customers observed this issue. In order to reproduce, In my test cluster, I intentionally increased the overall CPU limits to over 200% and monitored the cluster for more than 2 days. However, I did not see the KubeCPUOvercommit alert, which ideally should trigger after 10 minutes of overcommitment. 

      Allocated resources:
        (Total limits may be over 100 percent, i.e., overcommitted.)
        Resource                       Requests      Limits
        --------                          --------            ------
        cpu                                2654m (75%)        8450m (241%)
        memory                         5995Mi (87%)  12264Mi (179%)
        ephemeral-storage         0 (0%)        0 (0%)
        hugepages-1Gi             0 (0%)        0 (0%)
        hugepages-2Mi             0 (0%)        0 (0%)

       

      OCP console --> Observe --> alerting --> alert rule and select for the `KubeCPUOvercommit` alert.

      Expression:

      sum by (cluster) (namespace_cpu:kube_pod_container_resource_requests:sum{job="kube-state-metrics"}) - (sum by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) - max by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})) > 0 and (sum by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) - max by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})) > 0

            prasriva@redhat.com Pranshu Srivastava
            rhn-support-alaxkar Ayush Laxkar
            Junqi Zhao Junqi Zhao
            Simon Pasquier
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: