Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: None
Affects Version/s: odf-4.13
Component/s: ceph/CephFS/x86
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2250227
Dev Approval:
?
QE Approval:
?
Release Note Type:
If docs needed, set a value
Target Release:

odf-4.13.13
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

This bug was initially created as a copy of Bug #2249844

I am copying this bug because:

Description of problem (please be detailed as possible and provide log
snippests):
After upgrade execution to 4.13.5-6 from 4.12 - (both OCP and ODF upgrade)
we see ceph health warn issue:

sh-5.1$ ceph status
cluster:
id: 68dc565f-f700-4312-93be-265b7ed15941
health: HEALTH_WARN
1 daemons have recently crashed

services:
mon: 3 daemons, quorum a,b,c (age 78m)
mgr: a(active, since 77m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 77m), 3 in (since 2h)
rgw: 1 daemon active (1 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 12 pools, 185 pgs
objects: 1.05k objects, 2.0 GiB
usage: 5.9 GiB used, 1.5 TiB / 1.5 TiB avail
pgs: 185 active+clean

io:
client: 1.4 KiB/s rd, 134 KiB/s wr, 2 op/s rd, 2 op/s wr

sh-5.1$ ceph crash ls
ID ENTITY NEW
2023-11-15T08:10:44.427601Z_b4fd4568-7eb7-4508-ab38-58e561dc809a mgr.a *
sh-5.1$ ceph crash info 2023-11-15T08:10:44.427601Z_b4fd4568-7eb7-4508-ab38-58e561dc809a
{
"backtrace": [
"/lib64/libc.so.6(+0x54df0) [0x7f7c91f2bdf0]",
"/lib64/libc.so.6(+0xa154c) [0x7f7c91f7854c]",
"raise()",
"abort()",
"/lib64/libstdc++.so.6(+0xa1a01) [0x7f7c92279a01]",
"/lib64/libstdc++.so.6(+0xad37c) [0x7f7c9228537c]",
"/lib64/libstdc++.so.6(+0xad3e7) [0x7f7c922853e7]",
"/lib64/libstdc++.so.6(+0xad649) [0x7f7c92285649]",
"/usr/lib64/ceph/libceph-common.so.2(+0x170d39) [0x7f7c9256fd39]",
"(SnapRealmInfo::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x3b) [0x7f7c926a7f4b]",
"/lib64/libcephfs.so.2(+0xaaec7) [0x7f7c86c43ec7]",
"/lib64/libcephfs.so.2(+0xacc59) [0x7f7c86c45c59]",
"/lib64/libcephfs.so.2(+0xadf10) [0x7f7c86c46f10]",
"/lib64/libcephfs.so.2(+0x929e8) [0x7f7c86c2b9e8]",
"(DispatchQueue::entry()+0x53a) [0x7f7c9272defa]",
"/usr/lib64/ceph/libceph-common.so.2(+0x3bab31) [0x7f7c927b9b31]",
"/lib64/libc.so.6(+0x9f802) [0x7f7c91f76802]",
"/lib64/libc.so.6(+0x3f450) [0x7f7c91f16450]"
],
"ceph_version": "17.2.6-148.el9cp",
"crash_id": "2023-11-15T08:10:44.427601Z_b4fd4568-7eb7-4508-ab38-58e561dc809a",
"entity_name": "mgr.a",
"os_id": "rhel",
"os_name": "Red Hat Enterprise Linux",
"os_version": "9.2 (Plow)",
"os_version_id": "9.2",
"process_name": "ceph-mgr",
"stack_sig": "4cb0911c06087a31d9752535de90ba18fd7aab25c037945b2c61f584dcf6a6db",
"timestamp": "2023-11-15T08:10:44.427601Z",
"utsname_hostname": "rook-ceph-mgr-a-5d475468dd-wzhmt",
"utsname_machine": "x86_64",
"utsname_release": "5.14.0-284.40.1.el9_2.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PREEMPT_DYNAMIC Wed Nov 1 10:30:09 EDT 2023"
}

Discussed here:
https://chat.google.com/room/AAAAREGEba8/fZvCCW1MQfU

Venky pointed out that it smells like this issue:
https://tracker.ceph.com/issues/63188
BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=2247174

Venky cloned the 7.0 BZ to 6.1z4 target - https://bugzilla.redhat.com/show_bug.cgi?id=2249814

Version of all relevant components (if applicable):
ODF 4.13.5-6

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge?

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Can this issue reproducible?
Trying to reproduce here:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-encryption-1az-rhcos-vsan-lso-vmdk-3m-3w-upgrade-ocp-ocs-auto/32/

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
1. Install ODF 4.12 and OCP 4.12
2. Upgrade OCP to 4.13
3. Upgrade ODF to 4.13.5-6 build
4. After some time we see the health warn

Actual results:
Do not have health warn

Expected results:

Additional info:
Must gather:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-031vue1cslv33-uba/j-031vue1cslv33-uba_20231115T053551/logs/testcases_1700036781/j-031vue1cslv33-u/
Job:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-encryption-1az-rhcos-vsan-lso-vmdk-3m-3w-upgrade-ocp-ocs-auto/31/

Assignee:: Venky Shankar

Reporter:: Sunil Kumar Heggodu Gopala Acharya

QA Contact:: Elad Ben Aharon

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/11/17 6:56 AM

Updated:: 2025/02/26 9:25 AM

Resolved:: 2025/02/26 9:25 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty