-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Low
-
rhel-sst-arch-hw
-
ssg_platform_enablement
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
Description of problem:
Tried enabling kdump on RHCOS nodes of 4.15 cluster deployed on IBM Power. The kdump crash log generation is failing sometimes. Followed the RH DOC - Link for 4.14 - https://docs.openshift.com/container-platform/4.14/support/troubleshooting/troubleshooting-operating-system-issues.html#enabling-kdump-day-one:~:text=Enabling-,kdump,-RHCOS%20ships%20with Trigger the crash using - echo c > /proc/sysrq-trigger Configuration used to generate kdump - KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb hugetlb_cma ignition.firstboot rd.multipath=default" KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0" KEXEC_ARGS="--dt-no-old-root -s" KDUMP_IMG="vmlinuz" # kdump verification on cluster node - #cat "/etc/sysconfig/kdump" KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb hugetlb_cma ignition.firstboot rd.multipath=default" KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0" KEXEC_ARGS="--dt-no-old-root -s" KDUMP_IMG="vmlinuz" # cat /etc/kdump.conf path /var/crash core_collector makedumpfile -l --message-level 7 -d 31 OCP Version at Install Time: 4.15.0-ec.2 RHCOS Version at Install Time: 4.15 OCP Version after Upgrade (if applicable): NA RHCOS Version after Upgrade (if applicable): NA Architecture (x86_64, ppc64le, s390x, etc.): ppc64le
Version-Release number of selected component (if applicable):
4.15.0-ec.2
How reproducible:
6 out of 10 attempts failed
Steps to Reproduce:
1. deploy an OCP 4.14 cluster on IBM Power 2. Enable the kdump 3. Trigger the crash manually 4. Check if the crash logs are generated on the worker/master nodes of the cluster
Actual results:
Crash logs not getting generated # ls -lart /var/crash total 4 drwxr-xr-x. 24 root root 4096 Nov 28 05:42 .. drwxr-xr-x. 3 root root 43 Nov 28 05:51 .
Expected results:
Crash logs should get generated whenever crash occurs # ls -lart /var/crash total 4 drwxr-xr-x. 24 root root 4096 Nov 28 05:42 .. drwxr-xr-x. 3 root root 43 Nov 28 05:51 . drwxr-xr-x. 2 root root 67 Nov 28 05:51 127.0.0.1-2023-11-28-05:51:50
Additional info:
Crash logs should get generated every time after the trigger [root@worker-2 core]# systemctl status kdump ● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: disabled) Active: active (exited) since Wed 2023-11-29 09:20:53 UTC; 20h ago Main PID: 1254 (code=exited, status=0/SUCCESS) CPU: 1min 5.091sNov 29 09:20:32 worker-2 dracut[1807]: Stored kernel commandline: Nov 29 09:20:32 worker-2 dracut[1807]: No dracut internal kernel commandline stored in the initramfs Nov 29 09:20:35 worker-2 dracut[1807]: *** Install squash loader *** Nov 29 09:20:36 worker-2 dracut[1807]: *** Squashing the files inside the initramfs *** Nov 29 09:20:51 worker-2 dracut[1807]: *** Squashing the files inside the initramfs done *** Nov 29 09:20:51 worker-2 dracut[1807]: *** Creating image file '/var/lib/kdump/initramfs-5.14.0-284.40.1.el9_2.ppc64lekdump.img' *** Nov 29 09:20:51 worker-2 dracut[1807]: *** Creating initramfs image file '/var/lib/kdump/initramfs-5.14.0-284.40.1.el9_2.ppc64lekdump.img' done *** Nov 29 09:20:53 worker-2 kdumpctl[1298]: kdump: kexec: loaded kdump kernel Nov 29 09:20:53 worker-2 kdumpctl[1298]: kdump: Starting kdump: [OK] Nov 29 09:20:53 worker-2 systemd[1]: Finished Crash recovery kernel arming.
Uploaded the logs to below location.
link for logs - https://drive.google.com/drive/folders/1ChpifZEv2J8gxK6AJFkDjWIfkt1bOicD?usp=sharing
- links to