Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-254

[2262105] Stand-by MDS stuck in 'client replay' state when active MDS restarted.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • odf-4.15
    • ceph/CephFS/x86
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):
      Stand-by MDS stuck in 'client-replay' state only forever when the active gets MDS restarted and stand-by supposed to be active.

      Version of all relevant components (if applicable):
      4.15.0-126
      4.15.0-0.nightly-2024-01-25-051548

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Yes
      Is there any workaround available to the best of your knowledge?
      NA

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?
      2

      Can this issue reproducible?

      Yes
      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Run more IO on Active MDS to utilize High amount of cache.
      2.Active MDS pod will be restarted.

      3.Stand-by MDS supposed to be active. In that process, the MDS pod stuck in client-replay state forever.

      4. Now, none of the mds pods are active.

      Actual results:

      Stand-by MDS pod stuck in the 'client-replay' state

      Expected results:
      Stand-by MDS should be Active when active MDS failed or restarted.

      Additional info:

      sh-5.1$ ceph -s
      cluster:
      id: a622f0f3-09a6-412b-9b06-e651e1d75e7f
      health: HEALTH_WARN
      1 filesystem is degraded
      1 MDSs report slow requests
      1 MDSs behind on trimming

      services:
      mon: 3 daemons, quorum a,b,c (age 21h)
      mgr: a(active, since 21h), standbys: b
      mds: 1/1 daemons up, 1 standby
      osd: 3 osds: 3 up (since 21h), 3 in (since 6d)
      rgw: 1 daemon active (1 hosts, 1 zones)

      data:
      volumes: 0/1 healthy, 1 recovering
      pools: 12 pools, 169 pgs
      objects: 1.86M objects, 71 GiB
      usage: 237 GiB used, 1.3 TiB / 1.5 TiB avail
      pgs: 169 active+clean

      io:
      client: 195 KiB/s wr, 0 op/s rd, 3 op/s wr

      sh-5.1$ ceph fs status
      ocs-storagecluster-cephfilesystem - 5 clients
      =================================
      RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
      0 clientreplay ocs-storagecluster-cephfilesystem-a 1699k 1652k 727 13.4k
      POOL TYPE USED AVAIL
      ocs-storagecluster-cephfilesystem-metadata metadata 9155M 356G
      ocs-storagecluster-cephfilesystem-data0 data 30.6G 356G
      STANDBY MDS
      ocs-storagecluster-cephfilesystem-b
      MDS version: ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)
      sh-5.1$

      -----------------------------------------------------------------
      oc get pods
      NAME READY STATUS RESTARTS AGE
      csi-addons-controller-manager-8649f7f85f-z77p5 2/2 Running 0 50s
      csi-cephfsplugin-czhph 2/2 Running 13 (21h ago) 6d
      csi-cephfsplugin-m4cwp 2/2 Running 40 (22h ago) 6d
      csi-cephfsplugin-provisioner-7f87d9556b-dqwl6 6/6 Running 40 (21h ago) 6d
      csi-cephfsplugin-provisioner-7f87d9556b-gdgpp 6/6 Running 64 6d
      csi-cephfsplugin-rqf2k 2/2 Running 12 (21h ago) 6d
      csi-rbdplugin-8x6j6 3/3 Running 54 (22h ago) 6d
      csi-rbdplugin-bt5dp 3/3 Running 16 (21h ago) 6d
      csi-rbdplugin-provisioner-78884f6f8c-jqhlz 6/6 Running 62 6d
      csi-rbdplugin-provisioner-78884f6f8c-lq8mg 6/6 Running 79 6d
      csi-rbdplugin-snhdl 3/3 Running 16 (21h ago) 6d
      noobaa-core-0 1/1 Running 3 2d6h
      noobaa-db-pg-0 1/1 Running 3 2d6h
      noobaa-endpoint-5456dd8bd-4shm8 1/1 Running 1 23h
      noobaa-operator-54d5fc85b8-qsr5l 2/2 Running 74 (10h ago) 2d4h
      ocs-metrics-exporter-b94d575ff-pjd6c 1/1 Running 3 6d
      ocs-operator-d57b464dd-4szrv 1/1 Running 232 (8m25s ago) 6d
      odf-console-6d664888c8-tbnqw 1/1 Running 3 6d
      odf-operator-controller-manager-67ff86cb69-2fwjx 2/2 Running 207 (8m27s ago) 6d
      rook-ceph-crashcollector-compute-0-5776bbfc8d-ll4gh 1/1 Running 0 22h
      rook-ceph-crashcollector-compute-1-7bb5565597-4pktq 1/1 Running 0 22h
      rook-ceph-crashcollector-compute-2-c4d75658b-l9frn 1/1 Running 0 21h
      rook-ceph-exporter-compute-0-d79bbf9b8-gmqs4 1/1 Running 1 (21h ago) 22h
      rook-ceph-exporter-compute-1-75fff6dcbf-tmrdc 1/1 Running 0 22h
      rook-ceph-exporter-compute-2-5d7ffc454-767tc 1/1 Running 0 21h
      rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-56c5cd89s6f9x 2/2 Running 1 (66m ago) 150m
      rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-775ddcf88tv94 2/2 Running 1 (38m ago) 149m
      rook-ceph-mgr-a-59dcf4bbd9-6ccvn 3/3 Running 4 (21h ago) 22h
      rook-ceph-mgr-b-855b9c966b-gk57d 3/3 Running 1 (21h ago) 21h
      rook-ceph-mon-a-6d8d6595bf-rdv6m 2/2 Running 0 22h
      rook-ceph-mon-b-7f7775b869-bc68t 2/2 Running 0 22h
      rook-ceph-mon-c-6cc496dfd9-kbg42 2/2 Running 0 21h
      rook-ceph-operator-5b5c5d9b76-qwdkp 1/1 Running 9 6d
      rook-ceph-osd-0-76cc86458c-bmz6l 2/2 Running 0 22h
      rook-ceph-osd-1-6c469b7c87-krdc9 2/2 Running 0 22h
      rook-ceph-osd-2-6754d6657d-ws9vt 2/2 Running 0 21h
      rook-ceph-osd-prepare-ocs-deviceset-0-data-08bmpn-h4fbb 0/1 Completed 0 6d
      rook-ceph-osd-prepare-ocs-deviceset-1-data-0xztrp-srq9j 0/1 Completed 0 6d
      rook-ceph-osd-prepare-ocs-deviceset-2-data-0v4rs4-vplsv 0/1 Completed 0 6d
      rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-c5b686bfgp7t 2/2 Running 0 21h
      rook-ceph-tools-6c854d5d84-jmv7m 1/1 Running 3 6d
      ux-backend-server-7d5f748f7c-6mwb7 2/2 Running 6 6d

              khiremat@redhat.com Kotresh Hiremath Ravishankar
              rhn-support-nagreddy Nagendra Reddy
              Elad Ben Aharon Elad Ben Aharon
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: