Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: unclassified
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
?
Docs Approval:
?
PM Approval:
?
QE Approval:
?
Target Release:

odf-4.17.z
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

This is coming from https://issues.redhat.com/browse/DFBUGS-858

Current status is that ceph filesystem is damaged :

sh-5.1$ ceph -s
  cluster:
    id:     27eddc2c-03dd-42e2-8672-b52e6b2d3aa9
    health: HEALTH_ERR
            mons are allowing insecure global_id reclaim
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged
            36 daemons have recently crashed  services:
    mon: 3 daemons, quorum b,c,e (age 58m)
    mgr: a(active, since 37m), standbys: b
    mds: 0/1 daemons up, 1 standby
    osd: 12 osds: 12 up (since 88m), 12 in (since 3d)  data:
    volumes: 0/1 healthy, 1 recovering; 1 damaged
    pools:   12 pools, 281 pgs
    objects: 46.07M objects, 4.4 TiB
    usage:   14 TiB used, 34 TiB / 48 TiB avail
    pgs:     279 active+clean
             2   active+clean+scrubbing+deep

and mds are not able to become active (start as standby) ,

Tried to follow:
https://access.redhat.com/solutions/6889971

ceph mds repaired ocp7-storagecluster-cephfilesystem:0

But that did not help, FS remains damaged

Then tried: https://access.redhat.com/solutions/6123271

but failed to collect first command:

sh-5.1$ ceph tell mds.ocp7-storagecluster-cephfilesystem:0 damage ls
terminate called after throwing an instance of ‘std::out_of_range’
 what(): map::at
Aborted (core dumped)

also tried to repair, but got the same error:

sh-5.1$ ceph tell mds.ocp7-storagecluster-cephfilesystem:0 scrub start / force repair
terminate called after throwing an instance of ‘std::out_of_range’  what(): map::at Aborted (core dumped)

The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

VMware

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

ODF 4.15.7 Internal mode

The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

Does this issue impact your ability to continue to work with the product?

Yes

Is there any workaround available to the best of your knowledge?

Can this issue be reproduced? If so, please provide the hit rate

Not sure, problem started when customer deleted by mistake three mon and PVCs.

Can this issue be reproduced from the UI?

N/A

If this is a regression, please provide more details to justify this:

Steps to Reproduce:

see above

The exact date and time when the issue was observed, including timezone details:

15 nov

Actual results:

Cannot fix cephfs FS.

Expected results:

Fix cephfs FS

Logs collected and log location:

Additional info:

Assignee:: Unassigned

Reporter:: Miguel Duaso

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2024/11/15 1:00 PM

Updated:: 2025/09/13 5:38 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty