-
Bug
-
Resolution: Unresolved
-
Undefined
-
odf-4.18
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
-
-
None
Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:
[RDR] Ceph-osd crashed with reason Message::encode(unsigned long, int, bool)+0x2e) after upgrading ceph from 18.2.1-229.el9cp to 19.2.0-47.el9cp
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
Vmware-UPI
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):
RDR
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
OCP version:- 4.18.0-0.nightly-2024-11-07-215008
ODF version:- 4.18.0-49
CEPH version:- ceph version 19.2.0-47.el9cp (123a317ae596caa7f6d087fc76fffb6a736e0b5f) squid (stable)
ACM version:- 2.12.0
SUBMARINER version:- v0.19.0
VOLSYNC version:- volsync-product.v0.10.1
OADP version:- 1.4.1
VOLSYNC method:- destinationCopyMethod: Direct
Does this issue impact your ability to continue to work with the product?
Is there any workaround available to the best of your knowledge?
Can this issue be reproduced? If so, please provide the hit rate
Can this issue be reproduced from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1.Deploy 4.17 RDR cluster
2.Run some workloads
3.Upgrade cluster to 4.18
4. check ceph status
The exact date and time when the issue was observed, including timezone details:
2024-11-12T13:11:54.348893
Actual results:
$ceph crash ls
ID ENTITY NEW
2024-11-12T13:11:54.348893Z_3b312191-e0b7-4383-a93b-9980e9e08d54 osd.0 *
$ ceph crash info 2024-11-12T13:11:54.348893Z_3b312191-e0b7-4383-a93b-9980e9e08d54 { "backtrace": [ "/lib64/libc.so.6(+0x3e6f0) [0x7f44605fd6f0]", "/lib64/libc.so.6(+0x8b94c) [0x7f446064a94c]", "raise()", "abort()", "/lib64/libc.so.6(+0x2871b) [0x7f44605e771b]", "/lib64/libc.so.6(+0x37386) [0x7f44605f6386]", "ceph-osd(+0x8ca478) [0x56359cd6a478]", "(Message::encode(unsigned long, int, bool)+0x2e) [0x56359d09275e]", "(ProtocolV2::send_message(Message*)+0xc9) [0x56359d24a529]", "(AsyncConnection::send_message(Message*)+0x276) [0x56359d234396]", "(OSDService::send_message_osd_cluster(int, Message*, unsigned int)+0x1cc) [0x56359c9df4dc]", "(ReplicatedBackend::issue_op(hobject_t const&, eversion_t const&, unsigned long, osd_reqid_t, eversion_t, eversion_t, hobject_t, hobject_t, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, std::optional<pg_hit_set_history_t>&, ReplicatedBackend::InProgressOp*, ceph::os::Transaction&)+0x79d) [0x56359cd7f3cd]", "(ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x652) [0x56359cd7fbd2]", "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x392) [0x56359cb46202]", "(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x59) [0x56359cb4a529]", "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xcd0) [0x56359cb4de90]", "ceph-osd(+0x5eb53e) [0x56359ca8b53e]", "(CommonSafeTimer<std::mutex>::timer_thread()+0x12a) [0x56359cf64c1a]", "ceph-osd(+0xac55b1) [0x56359cf655b1]", "/lib64/libc.so.6(+0x89c02) [0x7f4460648c02]", "/lib64/libc.so.6(+0x10ec40) [0x7f44606cdc40]" ], "ceph_version": "19.2.0-47.el9cp", "crash_id": "2024-11-12T13:11:54.348893Z_3b312191-e0b7-4383-a93b-9980e9e08d54", "entity_name": "osd.0", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "9.4 (Plow)", "os_version_id": "9.4", "process_name": "ceph-osd", "stack_sig": "338c287e4eeaae0ee1893eb6f465b526af91f5ddc6f2a6689fe7b2e8097cd083", "timestamp": "2024-11-12T13:11:54.348893Z", "utsname_hostname": "rook-ceph-osd-0-bc49d7f7f-hsjsz", "utsname_machine": "x86_64", "utsname_release": "5.14.0-427.44.1.el9_4.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Fri Nov 1 14:40:56 EDT 2024" }
Expected results:
There should not be any crash
Logs collected and log location:
http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/sync_issue/
Additional info: