Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-537

[2274525] [IBM Z] Wrong ceph_osd_stat_bytes_used and ceph_osd_stat_bytes in ODF 4.15 / ceph quincy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.14
    • ceph-monitoring
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      When checking for the OSD disk usage, we find two sets of values of 2 metrics

      • ceph_osd_stat_bytes
      • ceph_osd_stat_bytes_used

      Obviously the correct set of value should be where ceph_osd_stat_bytes_used (actually used) is smaller than ceph_osd_stat_bytes (total).

      Set 1:
      Ceph Admin GUI and `ceph osd df` (via rook-ceph-tools-xx container) show similar set of values of those two metrics. It seems to be correct where ceph_osd_stat_bytes_used < ceph_osd_stat_bytes.

      Set 2:
      Prometheus GUI and internal /metrics endpoint show wrong values where ceph_osd_stat_bytes_used > ceph_osd_stat_bytes.

      The internal /metrics is derived from:

      1. oc describe pod rook-ceph-exporter-worker-xxx | grep IP
      2. curl -s http://10.129.2.32:9926/metrics | grep 'ceph_osd_stat_bytes'
        ceph_osd_stat_bytes {ceph_daemon="osd.2"} 6978919051352
        ceph_osd_stat_bytes_used{ceph_daemon="osd.2"}

        8441947082770

      When we tried testing ceph reef standalone, the numbers seem to be correct. I suspect the problem lies in ceph exporter code, I see some commits in reef branch has not been backported to quincy:
      https://github.com/ceph/ceph/commits/quincy/src/exporter

      Version of all relevant components (if applicable):
      ODF 4.15
      ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable)

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Is there any workaround available to the best of your knowledge?

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      Can this issue reproducible?
      Yes

      Can this issue reproduce from the UI?
      Yes

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              rhn-support-uchapaga Umanga Chapagain
              rh-ee-thoang Tuan Hoang
              Harish Nallur Vittal Rao Harish Nallur Vittal Rao
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: