-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
odf-4.15
-
None
Description of problem (please be detailed as possible and provide log
snippests):
Stand-by MDS stuck in 'client-replay' state only forever when the active gets MDS restarted and stand-by supposed to be active.
Version of all relevant components (if applicable):
4.15.0-126
4.15.0-0.nightly-2024-01-25-051548
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes
Is there any workaround available to the best of your knowledge?
NA
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Run more IO on Active MDS to utilize High amount of cache.
2.Active MDS pod will be restarted.
3.Stand-by MDS supposed to be active. In that process, the MDS pod stuck in client-replay state forever.
4. Now, none of the mds pods are active.
Actual results:
Stand-by MDS pod stuck in the 'client-replay' state
Expected results:
Stand-by MDS should be Active when active MDS failed or restarted.
Additional info:
sh-5.1$ ceph -s
cluster:
id: a622f0f3-09a6-412b-9b06-e651e1d75e7f
health: HEALTH_WARN
1 filesystem is degraded
1 MDSs report slow requests
1 MDSs behind on trimming
services:
mon: 3 daemons, quorum a,b,c (age 21h)
mgr: a(active, since 21h), standbys: b
mds: 1/1 daemons up, 1 standby
osd: 3 osds: 3 up (since 21h), 3 in (since 6d)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 0/1 healthy, 1 recovering
pools: 12 pools, 169 pgs
objects: 1.86M objects, 71 GiB
usage: 237 GiB used, 1.3 TiB / 1.5 TiB avail
pgs: 169 active+clean
io:
client: 195 KiB/s wr, 0 op/s rd, 3 op/s wr
sh-5.1$ ceph fs status
ocs-storagecluster-cephfilesystem - 5 clients
=================================
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 clientreplay ocs-storagecluster-cephfilesystem-a 1699k 1652k 727 13.4k
POOL TYPE USED AVAIL
ocs-storagecluster-cephfilesystem-metadata metadata 9155M 356G
ocs-storagecluster-cephfilesystem-data0 data 30.6G 356G
STANDBY MDS
ocs-storagecluster-cephfilesystem-b
MDS version: ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)
sh-5.1$
-----------------------------------------------------------------
oc get pods
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-8649f7f85f-z77p5 2/2 Running 0 50s
csi-cephfsplugin-czhph 2/2 Running 13 (21h ago) 6d
csi-cephfsplugin-m4cwp 2/2 Running 40 (22h ago) 6d
csi-cephfsplugin-provisioner-7f87d9556b-dqwl6 6/6 Running 40 (21h ago) 6d
csi-cephfsplugin-provisioner-7f87d9556b-gdgpp 6/6 Running 64 6d
csi-cephfsplugin-rqf2k 2/2 Running 12 (21h ago) 6d
csi-rbdplugin-8x6j6 3/3 Running 54 (22h ago) 6d
csi-rbdplugin-bt5dp 3/3 Running 16 (21h ago) 6d
csi-rbdplugin-provisioner-78884f6f8c-jqhlz 6/6 Running 62 6d
csi-rbdplugin-provisioner-78884f6f8c-lq8mg 6/6 Running 79 6d
csi-rbdplugin-snhdl 3/3 Running 16 (21h ago) 6d
noobaa-core-0 1/1 Running 3 2d6h
noobaa-db-pg-0 1/1 Running 3 2d6h
noobaa-endpoint-5456dd8bd-4shm8 1/1 Running 1 23h
noobaa-operator-54d5fc85b8-qsr5l 2/2 Running 74 (10h ago) 2d4h
ocs-metrics-exporter-b94d575ff-pjd6c 1/1 Running 3 6d
ocs-operator-d57b464dd-4szrv 1/1 Running 232 (8m25s ago) 6d
odf-console-6d664888c8-tbnqw 1/1 Running 3 6d
odf-operator-controller-manager-67ff86cb69-2fwjx 2/2 Running 207 (8m27s ago) 6d
rook-ceph-crashcollector-compute-0-5776bbfc8d-ll4gh 1/1 Running 0 22h
rook-ceph-crashcollector-compute-1-7bb5565597-4pktq 1/1 Running 0 22h
rook-ceph-crashcollector-compute-2-c4d75658b-l9frn 1/1 Running 0 21h
rook-ceph-exporter-compute-0-d79bbf9b8-gmqs4 1/1 Running 1 (21h ago) 22h
rook-ceph-exporter-compute-1-75fff6dcbf-tmrdc 1/1 Running 0 22h
rook-ceph-exporter-compute-2-5d7ffc454-767tc 1/1 Running 0 21h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-56c5cd89s6f9x 2/2 Running 1 (66m ago) 150m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-775ddcf88tv94 2/2 Running 1 (38m ago) 149m
rook-ceph-mgr-a-59dcf4bbd9-6ccvn 3/3 Running 4 (21h ago) 22h
rook-ceph-mgr-b-855b9c966b-gk57d 3/3 Running 1 (21h ago) 21h
rook-ceph-mon-a-6d8d6595bf-rdv6m 2/2 Running 0 22h
rook-ceph-mon-b-7f7775b869-bc68t 2/2 Running 0 22h
rook-ceph-mon-c-6cc496dfd9-kbg42 2/2 Running 0 21h
rook-ceph-operator-5b5c5d9b76-qwdkp 1/1 Running 9 6d
rook-ceph-osd-0-76cc86458c-bmz6l 2/2 Running 0 22h
rook-ceph-osd-1-6c469b7c87-krdc9 2/2 Running 0 22h
rook-ceph-osd-2-6754d6657d-ws9vt 2/2 Running 0 21h
rook-ceph-osd-prepare-ocs-deviceset-0-data-08bmpn-h4fbb 0/1 Completed 0 6d
rook-ceph-osd-prepare-ocs-deviceset-1-data-0xztrp-srq9j 0/1 Completed 0 6d
rook-ceph-osd-prepare-ocs-deviceset-2-data-0v4rs4-vplsv 0/1 Completed 0 6d
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-c5b686bfgp7t 2/2 Running 0 21h
rook-ceph-tools-6c854d5d84-jmv7m 1/1 Running 3 6d
ux-backend-server-7d5f748f7c-6mwb7 2/2 Running 6 6d