Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-863

[GSS] Fail to repair cephfs filesystem is offline : terminate called after throwing an instance of ‘std::out_of_range’

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • odf-4.17
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

      This is coming from https://issues.redhat.com/browse/DFBUGS-858

      Current status is that ceph filesystem is damaged :

      sh-5.1$ ceph -s
        cluster:
          id:     27eddc2c-03dd-42e2-8672-b52e6b2d3aa9
          health: HEALTH_ERR
                  mons are allowing insecure global_id reclaim
                  1 filesystem is degraded
                  1 filesystem is offline
                  1 mds daemon damaged
                  36 daemons have recently crashed  services:
          mon: 3 daemons, quorum b,c,e (age 58m)
          mgr: a(active, since 37m), standbys: b
          mds: 0/1 daemons up, 1 standby
          osd: 12 osds: 12 up (since 88m), 12 in (since 3d)  data:
          volumes: 0/1 healthy, 1 recovering; 1 damaged
          pools:   12 pools, 281 pgs
          objects: 46.07M objects, 4.4 TiB
          usage:   14 TiB used, 34 TiB / 48 TiB avail
          pgs:     279 active+clean
                   2   active+clean+scrubbing+deep 

      and mds are not able to become active (start as standby) , 

      Tried to follow:
      https://access.redhat.com/solutions/6889971

      ceph mds repaired ocp7-storagecluster-cephfilesystem:0

      But that did not help, FS remains damaged

      Then tried: https://access.redhat.com/solutions/6123271 

      but failed to collect first command:

      sh-5.1$ ceph tell mds.ocp7-storagecluster-cephfilesystem:0 damage ls
      terminate called after throwing an instance of ‘std::out_of_range’
       what(): map::at
      Aborted (core dumped)

      also tried  to repair, but got the same error:

      sh-5.1$ ceph tell mds.ocp7-storagecluster-cephfilesystem:0 scrub start / force repair
      terminate called after throwing an instance of ‘std::out_of_range’  what(): map::at Aborted (core dumped)

       

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      VMware

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      ODF 4.15.7 Internal mode

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

       

       

      Does this issue impact your ability to continue to work with the product?

      Yes

       

      Is there any workaround available to the best of your knowledge?

      No

       

      Can this issue be reproduced? If so, please provide the hit rate

      Not sure, problem started when customer deleted by mistake three mon and PVCs.

       

      Can this issue be reproduced from the UI?

      N/A

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:

      see above

      The exact date and time when the issue was observed, including timezone details:

      15 nov

      Actual results:

      Cannot fix cephfs FS.

      Expected results:

      Fix cephfs FS

      Logs collected and log location:

       

      Additional info:

       

              asriram@redhat.com Anjana Sriram
              rhn-support-mduasope Miguel Duaso
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: