Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9325

node_exporter uses high cpu under load

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • Low
    • None
    • Unspecified
    • None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The node_exporter sometimes uses high cpu under load and it looks like a spinlock race on multiple CPUs.

      Version-Release number of selected component (if applicable):

      4.8.23, large number of CPUs host like 96 CPUs

      How reproducible:

      Always in customer env

      Steps to Reproduce:
      1. Generate load and monitor node resources using top command with 10 sec interval
      2.
      3.

      Actual results:

      node_exporter sometimes use N * 100% CPU for a while. Normally it uses only 5%.

      Expected results:

      No unexpected high CPU usage with node_exporter

      Additional info:

      Similar spinlock race high CPU usage is reported in upstream when cpufreq collector is enabled. It sounds like the spinlock race happens without the cpufreq where the node_exporter cannot get metrics smoothly for some reason.

      https://github.com/prometheus/node_exporter/issues/1963
      https://github.com/prometheus/node_exporter/pull/1964
      https://github.com/prometheus/node_exporter/issues/1880

              hasun@redhat.com Haoyu Sun
              rhn-support-tkimura Takayoshi Kimura
              None
              None
              Hongyan Li Hongyan Li
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: