[RHEL-17860] kdump crash logs are not getting generated on the coresos nodes on a cluster deployed on Power - Red Hat Issue Tracker

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhel-9.2.0
Component/s: kernel / Platform Enablement / ppc64
Labels:
- auto-closed
- ppc64le

Regression:
None
Severity:
Low

Pool Team:

rhel-sst-arch-hw
Sub-System Group:

ssg_platform_enablement

Story Points:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Experience:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Description of problem:

Tried enabling kdump on RHCOS nodes of 4.15 cluster deployed on IBM Power. 
The kdump crash log generation is failing sometimes.

Followed the RH DOC -
Link for 4.14 - https://docs.openshift.com/container-platform/4.14/support/troubleshooting/troubleshooting-operating-system-issues.html#enabling-kdump-day-one:~:text=Enabling-,kdump,-RHCOS%20ships%20with

Trigger the crash using - 
echo c > /proc/sysrq-trigger

Configuration used to generate kdump -

KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb hugetlb_cma ignition.firstboot rd.multipath=default"
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0"
KEXEC_ARGS="--dt-no-old-root -s"
KDUMP_IMG="vmlinuz"

# kdump verification on cluster node -

#cat "/etc/sysconfig/kdump" 
KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb hugetlb_cma ignition.firstboot rd.multipath=default" KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0" KEXEC_ARGS="--dt-no-old-root -s" KDUMP_IMG="vmlinuz"


# cat /etc/kdump.conf 
path /var/crash 
core_collector makedumpfile -l --message-level 7 -d 31


OCP Version at Install Time: 4.15.0-ec.2
RHCOS Version at Install Time: 4.15
OCP Version after Upgrade (if applicable): NA
RHCOS Version after Upgrade (if applicable): NA
Architecture (x86_64, ppc64le, s390x, etc.): ppc64le

Version-Release number of selected component (if applicable):

4.15.0-ec.2

How reproducible:

6 out of 10 attempts failed

Steps to Reproduce:

1. deploy an OCP 4.14 cluster on IBM Power
2. Enable the kdump
3. Trigger the crash manually 
4. Check if the crash logs are generated on the worker/master nodes of the cluster

Actual results:

Crash logs not getting generated 

# ls -lart /var/crash
total 4
drwxr-xr-x. 24 root root 4096 Nov 28 05:42 ..
drwxr-xr-x.  3 root root   43 Nov 28 05:51 .

Expected results:

Crash logs should get generated whenever crash occurs

# ls -lart /var/crash
total 4
drwxr-xr-x. 24 root root 4096 Nov 28 05:42 ..
drwxr-xr-x.  3 root root   43 Nov 28 05:51 .
drwxr-xr-x.  2 root root   67 Nov 28 05:51 127.0.0.1-2023-11-28-05:51:50

Additional info:

Crash logs should get generated every time after the trigger


[root@worker-2 core]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: disabled)
     Active: active (exited) since Wed 2023-11-29 09:20:53 UTC; 20h ago
   Main PID: 1254 (code=exited, status=0/SUCCESS)
        CPU: 1min 5.091sNov 29 09:20:32 worker-2 dracut[1807]: Stored kernel commandline:
Nov 29 09:20:32 worker-2 dracut[1807]: No dracut internal kernel commandline stored in the initramfs
Nov 29 09:20:35 worker-2 dracut[1807]: *** Install squash loader ***
Nov 29 09:20:36 worker-2 dracut[1807]: *** Squashing the files inside the initramfs ***
Nov 29 09:20:51 worker-2 dracut[1807]: *** Squashing the files inside the initramfs done ***
Nov 29 09:20:51 worker-2 dracut[1807]: *** Creating image file '/var/lib/kdump/initramfs-5.14.0-284.40.1.el9_2.ppc64lekdump.img' ***
Nov 29 09:20:51 worker-2 dracut[1807]: *** Creating initramfs image file '/var/lib/kdump/initramfs-5.14.0-284.40.1.el9_2.ppc64lekdump.img' done ***
Nov 29 09:20:53 worker-2 kdumpctl[1298]: kdump: kexec: loaded kdump kernel
Nov 29 09:20:53 worker-2 kdumpctl[1298]: kdump: Starting kdump: [OK]
Nov 29 09:20:53 worker-2 systemd[1]: Finished Crash recovery kernel arming.

Uploaded the logs to below location.

link for logs - https://drive.google.com/drive/folders/1ChpifZEv2J8gxK6AJFkDjWIfkt1bOicD?usp=sharing

links to

IBM Bugzilla 204503

Assignee:: Mamatha Inamdar

Reporter:: Swapnil Bobade

Watchers Groups:: IBM Confidential Group

Developer:: Mamatha Inamdar

QA Contact:: Brock Organ

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/11/30 3:17 AM

Updated:: 2025/03/15 2:04 AM

Resolved:: 2025/03/15 2:04 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates