Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46457

`KubeCPUOvercommit` Alert Not Triggered Despite Node CPU is Overcommitment

XMLWordPrintable

    • Moderate
    • No
    • MON Sprint 263, MON Sprint 264
    • 2
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-46456. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-46455. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-46454. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-46453. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-35095. The following is the description of the original issue:

      One of our customers observed this issue. In order to reproduce, In my test cluster, I intentionally increased the overall CPU limits to over 200% and monitored the cluster for more than 2 days. However, I did not see the KubeCPUOvercommit alert, which ideally should trigger after 10 minutes of overcommitment. 

      Allocated resources:
        (Total limits may be over 100 percent, i.e., overcommitted.)
        Resource                       Requests      Limits
        --------                          --------            ------
        cpu                                2654m (75%)        8450m (241%)
        memory                         5995Mi (87%)  12264Mi (179%)
        ephemeral-storage         0 (0%)        0 (0%)
        hugepages-1Gi             0 (0%)        0 (0%)
        hugepages-2Mi             0 (0%)        0 (0%)

       

      OCP console --> Observe --> alerting --> alert rule and select for the `KubeCPUOvercommit` alert.

      Expression:

      sum by (cluster) (namespace_cpu:kube_pod_container_resource_requests:sum{job="kube-state-metrics"}) - (sum by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) - max by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})) > 0 and (sum by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) - max by (cluster) (kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})) > 0

              prasriva@redhat.com Pranshu Srivastava
              openshift-crt-jira-prow OpenShift Prow Bot
              Junqi Zhao Junqi Zhao
              Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: