-
Bug
-
Resolution: Unresolved
-
Major
-
rhel-8.10
-
None
-
No
-
Moderate
-
rhel-sst-filesystems
-
ssg_filesystems_storage_and_HA
-
3
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
Requested
-
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
They rebooted a cluster node. This cause gfs2 to replay journal on surviving cluster node. Then a few seconds later a kernel panic occurred on the surviving cluster node.
The vmcore has been uploaded to: Task #493416609 - Retrace Server Task Manager
[2410978.308542] -----------[ cut here ]-----------
[2410978.308543] kernel BUG at fs/gfs2/glock.c:764!
[2410978.308625] invalid opcode: 0000 1 SMP NOPTI
[2410978.308694] CPU: 30 PID: 2843138 Comm: Thread-4 (Activ Kdump: loaded Tainted: P OE -------- - - 4.18.0-553.16.1.el8_10.x86_64 #1
[2410978.308847] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 02/22/2024
[2410978.308941] RIP: 0010:do_xmote.cold.70+0x49/0x4b [gfs2]
What is the impact of this issue to you?
A kernel panic occurred and caused the cluster node to be fenced.
Please provide the package NVR for which the bug is seen:
kernel-4.18.0-553.16.1.el8_10.x86_64
How reproducible is this bug?:
There is a kernel module that might can reproduce the issue. It is spamming PR -> CW conversion and a the master lock node need to be fenced while such request is happening and an answer from the master lock node is expecting. However it is not a 100% reproducer to always being hit.
https://gitlab.com/linux-gfs2/rhel-64860
Expected results
A kernel panic should not occur
Actual results
A kernel panic occurred.