Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-450

[2293632] ODF cephblockpool warning or ERROR status not informative in UI nor raised in ceph tools

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • odf-4.16, odf-4.15
    • management-console
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      While testing CNV upgrade, found ODF did not raise a warning to a high enough visibility level, leaving the user to dig through the UI and CLI to determine the cause.

      From the UI:

      Storage shows green on the cluster overview.
      Data Foundation UI is showing green for both Data Foundation and StorageSystem
      StorageSystem shows green
      CephBlockPool is showing 'ready', but has a warning triangle with no other feedback.

      From ceph-tools:

      sh-5.1$ ceph -s
      cluster:
      id: 97e1d345-532a-490e-8cec-16b51ce7d36d
      health: HEALTH_OK

      services:
      mon: 3 daemons, quorum b,c,d (age 4d)
      mgr: a(active, since 4d), standbys: b
      mds: 1/1 daemons up, 1 hot standby
      osd: 3 osds: 3 up (since 4d), 3 in (since 5d)
      rgw: 1 daemon active (1 hosts, 1 zones)

      data:
      volumes: 1/1 healthy
      pools: 12 pools, 265 pgs
      objects: 394.11k objects, 1.3 TiB
      usage: 3.0 TiB used, 1.4 TiB / 4.4 TiB avail
      pgs: 265 active+clean

      io:
      client: 73 MiB/s rd, 479 MiB/s wr, 19.18k op/s rd, 11.70k op/s wr

      sh-5.1$ ceph df
      — RAW STORAGE —
      CLASS SIZE AVAIL USED RAW USED %RAW USED
      ssd 4.4 TiB 1.4 TiB 3.0 TiB 3.0 TiB 67.85
      TOTAL 4.4 TiB 1.4 TiB 3.0 TiB 3.0 TiB 67.85

      — POOLS —
      POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
      .mgr 1 1 769 KiB 2 2.3 MiB 0 256 GiB
      ocs-storagecluster-cephblockpool 2 128 1007 GiB 393.69k 3.0 TiB 79.77 256 GiB
      .rgw.root 3 8 5.8 KiB 16 180 KiB 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.meta 4 8 5.3 KiB 17 152 KiB 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 5 8 0 B 0 0 B 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.control 6 8 0 B 8 0 B 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.otp 7 8 0 B 0 0 B 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.log 8 8 327 KiB 340 2.8 MiB 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.buckets.index 9 8 3.1 KiB 11 9.4 KiB 0 256 GiB
      ocs-storagecluster-cephfilesystem-metadata 10 16 303 KiB 22 996 KiB 0 256 GiB
      ocs-storagecluster-cephfilesystem-data0 11 32 0 B 0 0 B 0 256 GiB
      ocs-storagecluster-cephobjectstore.rgw.buckets.data 12 32 1 KiB 1 12 KiB 0 256 GiB

      sh-5.1$ ceph osd df
      ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
      1 ssd 1.45549 1.00000 1.5 TiB 1011 GiB 1007 GiB 918 KiB 4.1 GiB 479 GiB 67.87 1.00 265 up
      0 ssd 1.45549 1.00000 1.5 TiB 1011 GiB 1007 GiB 918 KiB 3.7 GiB 479 GiB 67.84 1.00 265 up
      2 ssd 1.45549 1.00000 1.5 TiB 1011 GiB 1007 GiB 915 KiB 4.0 GiB 479 GiB 67.86 1.00 265 up
      TOTAL 4.4 TiB 3.0 TiB 3.0 TiB 2.7 MiB 12 GiB 1.4 TiB 67.86
      MIN/MAX VAR: 1.00/1.00 STDDEV: 0.01

      On a separate test cluster (4.15): I purposely maxed out the storage and brought ODF back from a readonly state, deleting the excess VMs in the process to do so. The warning icon displayed on the CephBlockPool details page in the UI disappeared when the ocs-storagecluster-cephblockpool used%, referenced in the output of ceph df, approached 75% (75.24%) I imagine it's not calculated the same in the UI vs in ceph. The warning state, if is correct, should be bubbled further up in the UI and the ceph-tools health status.

      Version of all relevant components (if applicable):

      4.15

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      No. User impact is a lack of notification of a warning state either by UI or via ceph tools

      Is there any workaround available to the best of your knowledge?

      W/A is to reduce cephblockpool usage

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      1

      Can this issue reproducible?

      Yes

      Can this issue reproduce from the UI?

      Yes

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Install OCP/ODF
      2. Raise ceph block pool usage to > 75% without triggering warnings

      Actual results:

      Single warning icon burried in UI with no other information and HEALTH_OK from ceph tools

      Expected results:

      Warning state obvious to the user either via UI or HEALTH_WARN from ceph tools

      Additional info:

              skatiyar@redhat.com Sanjal Katiyar
              rhn-support-sbennert Sarah Bennert
              Tirumala Satya Prasad Desala Tirumala Satya Prasad Desala
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: