-
Bug
-
Resolution: Done
-
Major
-
None
-
4.16.z, 4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Kdump fails to complete and system gets hung without any reboot after kernel panic command is executed. On OpenShift Container Platform nodes running RHCOS kernel 5.14.0-427.72.1.el9_4.x86_64+rt, initiating a kernel panic using echo c > /proc/sysrq-trigger results in the node hanging/freezing, preventing the generation of a vmcore dump. This issue consistently occurs when a PerformanceProfile is applied to the node, specifically one that configures CPU isolation (e.g., isolcpus, nohz_full, rcu_nocbs).
Version-Release number of selected component (if applicable):
OCP 4.16.0 nightly build kexec-tools 2.0.27
How reproducible:
Easy to reproduc
Steps to Reproduce:
1. Install OCP on single node openshift cluster
2. Apply performance profiling with below CPU cores spec & with realtimekernel enabled=true
isolated: "2-55,58-111" reserved: "0,1,56,57"
3. Configure Kdump wirh craskkernel=512MB
4. Execute Kernel panic command
5. Monitor server console , verify that console gets freeze/hung , not rebooted. Manual intervention is needed to power off/on
Actual results:
System does not get reboot and gets hung/freeze
Expected results:
system should get reboot after kernel panic command and Kdump file should be generated in defined path /var/crash
Additional info:
kind: PerformanceProfile
apiVersion: "performance.openshift.io/v2"
metadata:
name: sno-perf-profile
annotations:
kubeletconfig.experimental: |
{"allowedUnsafeSysctls":["net.ipv4.tcp_tw_reuse"]}
spec:
cpu:
isolated: "2-55,58-111"
reserved: "0,1,56,57"
hugepages:
pages:
- size: "1G"
count: 52
node: 0
- size: "1G"
count: 52
node: 1
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: true
workloadHints:
highPowerConsumption: false
perPodPowerManagement: false
realTime: true
nodeSelector:
node-role.kubernetes.io/master: ""
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master:
=====================
sh-5.1# uname -r
5.14.0-427.72.1.el9_4.x86_64+rt
=========================
sh-5.1# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-dd7825b0c917bfcf0ccfc9d9cd41f7ae951accb7a206a56030c6c6bb02975df3/vmlinuz-5.14.0-427.72.1.el9_4.x86_64+rt ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/dd7825b0c917bfcf0ccfc9d9cd41f7ae951accb7a206a56030c6c6bb02975df3/0 root=UUID=affe3e2a-4fa1-4603-9523-40718b718026 rw rootflags=prjquota boot=UUID=46ef05dc-841c-4df5-9e9c-3f2d51c226e3 intel_iommu=on iommu=pt skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-55,58-111 tuned.non_isolcpus=03000000,00000003 systemd.cpu_affinity=0,1,56,57 intel_iommu=on iommu=pt isolcpus=managed_irq,2-55,58-111 nohz_full=2-55,58-111 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 intel_pstate=active systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0 crashkernel=1024M
sh-5.1#
sh-5.1# kdumpctl estimate
Reserved crashkernel: 512M
Recommended crashkernel: 512MKernel image size: 53M
Kernel modules size: 23M
Initramfs size: 68M
Runtime reservation: 64M
Large modules:
xfs: 2543616
mlx5_core: 2486272
ext4: 1191936
ice: 1241088
kvm: 1351680
sh-5.1#
- relates to
-
OCPBUGS-54520 kdump not generated on the Dell PowerEdge XR11
-
- Closed
-