Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-368

[2313424] [GSS] MDSCacheUsageHigh alert firing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.15
    • ceph-monitoring
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      The customer is experiencing the MDSCacheUsageHigh alert firing. They've applied the fix for this [1], but the alert is still firing. Furthermore, when we look at the memory consumption of the mds pods, it's only at ~25% currently.

      [root@bastionocpcrystal ~]# oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
      sh-5.1$ export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
      sh-5.1$ ceph config dump | grep mds_cache_memory_limit
      mds.ocs-storagecluster-cephfilesystem-a basic mds_cache_memory_limit 8589934592
      mds.ocs-storagecluster-cephfilesystem-b basic mds_cache_memory_limit 8589934592

      For node msplatform-x9ggd-storage-tnhvb:

      Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
      --------- ---- ------------ ---------- --------------- ------------- —
      openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-75b6d77bggzfx 2 (12%) 2 (12%) 16Gi (25%) 16Gi (25%) 51m

      For node msplatform-x9ggd-storage-vh2hj:

      Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
      --------- ---- ------------ ---------- --------------- ------------- —
      openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7d8c86bdzcrk2 2 (12%) 2 (12%) 16Gi (25%) 16Gi (25%) 52m

      [1] https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.15/html-single/troubleshooting_openshift_data_foundation/index?extIdCarryOver=true&sc_cid=7013a000003SyEYAA0#ceph_mds_cache_usage_high_rhodf

      Version of all relevant components (if applicable):

      ODF 4.15.6

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      No but it's annoying given that it's appears to be a faulty alert that's firing.

      Is there any workaround available to the best of your knowledge?

      Not to my knowledge

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      3

      Can this issue reproducible?

      Unknown

      Can this issue reproduce from the UI?

      No

      If this is a regression, please provide more details to justify this:

      Unknown

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:
      MDSCacheUsageHigh alert firing

      Expected results:
      MDSCacheUsageHigh does not fire

      Additional info:

      • It should also be noted that the "mds_cache_memory_limit" value for both mds pods did not increase to half of the mds pod's memory values as it should have. I had to set the "mds_cache_memory_limit" to "8589934592" manually using the rook-ceph-tools pod. This still didn't resolve the misfiring alert.
      • Ceph is HEALTH_OK:

      cluster:
      id: 05c475dc-e78e-4f2b-94c1-d97e7c6859fa
      health: HEALTH_OK

              dkamboj@redhat.com Divyansh Kamboj
              rhn-support-bmcmurra Brandon McMurray
              Brandon McMurray, Raimund Sacherer, Santosh Pillai
              Harish Nallur Vittal Rao Harish Nallur Vittal Rao
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: