-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.15
-
None
Description of problem (please be detailed as possible and provide log
snippests):
The customer is experiencing the MDSCacheUsageHigh alert firing. They've applied the fix for this [1], but the alert is still firing. Furthermore, when we look at the memory consumption of the mds pods, it's only at ~25% currently.
[root@bastionocpcrystal ~]# oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
sh-5.1$ export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
sh-5.1$ ceph config dump | grep mds_cache_memory_limit
mds.ocs-storagecluster-cephfilesystem-a basic mds_cache_memory_limit 8589934592
mds.ocs-storagecluster-cephfilesystem-b basic mds_cache_memory_limit 8589934592
For node msplatform-x9ggd-storage-tnhvb:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- —
openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-75b6d77bggzfx 2 (12%) 2 (12%) 16Gi (25%) 16Gi (25%) 51m
For node msplatform-x9ggd-storage-vh2hj:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- —
openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7d8c86bdzcrk2 2 (12%) 2 (12%) 16Gi (25%) 16Gi (25%) 52m
Version of all relevant components (if applicable):
ODF 4.15.6
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No but it's annoying given that it's appears to be a faulty alert that's firing.
Is there any workaround available to the best of your knowledge?
Not to my knowledge
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3
Can this issue reproducible?
Unknown
Can this issue reproduce from the UI?
No
If this is a regression, please provide more details to justify this:
Unknown
Steps to Reproduce:
1.
2.
3.
Actual results:
MDSCacheUsageHigh alert firing
Expected results:
MDSCacheUsageHigh does not fire
Additional info:
- It should also be noted that the "mds_cache_memory_limit" value for both mds pods did not increase to half of the mds pod's memory values as it should have. I had to set the "mds_cache_memory_limit" to "8589934592" manually using the rook-ceph-tools pod. This still didn't resolve the misfiring alert.
- Ceph is HEALTH_OK:
cluster:
id: 05c475dc-e78e-4f2b-94c1-d97e7c6859fa
health: HEALTH_OK