Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-17668

instance_cpu_usage is not properly calculated on multiple prometheus instances

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhos-18.0.11
    • rhos-18.0 FR 2 (Mar 2025)
    • openstack-watcher
    • None
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • openstack-watcher-10.0.1-18.0.20250625141952.c014f81.el9ost
    • None
    • Release Note Not Required
    • Regression Only
    • Workload Evolution Sprint 4, Workload Evolution Sprint 5, Workloads Evolution Sprint 6
    • 3
    • Important

      Tempest test on prometheus fails on fake metrics because there are ceilometer_cpu metrics both faked and from real ceilometer backends, and the datasource driver is taking data just from the real one instead of aggregatin both. That leads to getting metrics near to 0:

      Jun 10 09:56:54.117054 np0041073325 watcher-decision-engine[100130]: DEBUG watcher.decision_engine.strategy.strategies.workload_balance [None req-984e8be0-e22a-43f9-893b-524133ea4525 None None] Host usage for (c74a714c-5d1d-4402-ae19-5b32ad636414): host_cpu_usage_percent is 0.684483, lower than threshold 10.000000 (pid=100130) group_hosts_by_cpu_or_ram_util /opt/stack/watcher/watcher/decision_engine/strategy/strategies/workload_balance.py:286

      Jun 10 08:20:10.622963 np0041072552 watcher-decision-engine[100094]: DEBUG watcher.decision_engine.strategy.strategies.workload_balance [None req-8739e54d-61ca-402e-acbf-2bc13745bc26 None None] Host usage for (7604851e-eca6-4406-a5fe-f5dd8b114e83): host_cpu_usage_percent is 0.782753, lower than threshold 10.000000 (pid=100094) group_hosts_by_cpu_or_ram_util /opt/stack/watcher/watcher/decision_engine/strategy/strategies/workload_balance.py:286

      This is because the function:

      https://github.com/openstack/watcher/blob/15981117ee28627f235264e505e1e0d5956cf4e4/watcher/decision_engine/datasources/prometheus.py#L345

      is aggregating by instance instead of resource.

       

              amoralej1@redhat.com Alfredo Moralejo Alonso
              amoralej1@redhat.com Alfredo Moralejo Alonso
              David Sanz Moreno David Sanz Moreno
              rhos-workloads-evolution
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: