-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16.z, 4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Kdump fails to complete and system gets hung without any reboot after kernel panic command is executed. On OpenShift Container Platform nodes running RHCOS kernel 5.14.0-427.72.1.el9_4.x86_64+rt, initiating a kernel panic using echo c > /proc/sysrq-trigger results in the node hanging/freezing, preventing the generation of a vmcore dump. This issue consistently occurs when a PerformanceProfile is applied to the node, specifically one that configures CPU isolation (e.g., isolcpus, nohz_full, rcu_nocbs).
Version-Release number of selected component (if applicable):
OCP 4.16.0 nightly build kexec-tools 2.0.27
How reproducible:
Easy to reproduc
Steps to Reproduce:
1. Install OCP on single node openshift cluster 2. Apply performance profiling with below CPU cores spec & with realtimekernel enabled=true isolated: "2-55,58-111" reserved: "0,1,56,57" 3. Configure Kdump wirh craskkernel=512MB 4. Execute Kernel panic command 5. Monitor server console , verify that console gets freeze/hung , not rebooted. Manual intervention is needed to power off/on
Actual results:
System does not get reboot and gets hung/freeze
Expected results:
system should get reboot after kernel panic command and Kdump file should be generated in defined path /var/crash
Additional info:
kind: PerformanceProfile apiVersion: "performance.openshift.io/v2" metadata: name: sno-perf-profile annotations: kubeletconfig.experimental: | {"allowedUnsafeSysctls":["net.ipv4.tcp_tw_reuse"]} spec: cpu: isolated: "2-55,58-111" reserved: "0,1,56,57" hugepages: pages: - size: "1G" count: 52 node: 0 - size: "1G" count: 52 node: 1 numa: topologyPolicy: restricted realTimeKernel: enabled: true workloadHints: highPowerConsumption: false perPodPowerManagement: false realTime: true nodeSelector: node-role.kubernetes.io/master: "" machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: ===================== sh-5.1# uname -r 5.14.0-427.72.1.el9_4.x86_64+rt ========================= sh-5.1# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-dd7825b0c917bfcf0ccfc9d9cd41f7ae951accb7a206a56030c6c6bb02975df3/vmlinuz-5.14.0-427.72.1.el9_4.x86_64+rt ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/dd7825b0c917bfcf0ccfc9d9cd41f7ae951accb7a206a56030c6c6bb02975df3/0 root=UUID=affe3e2a-4fa1-4603-9523-40718b718026 rw rootflags=prjquota boot=UUID=46ef05dc-841c-4df5-9e9c-3f2d51c226e3 intel_iommu=on iommu=pt skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-55,58-111 tuned.non_isolcpus=03000000,00000003 systemd.cpu_affinity=0,1,56,57 intel_iommu=on iommu=pt isolcpus=managed_irq,2-55,58-111 nohz_full=2-55,58-111 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 intel_pstate=active systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0 crashkernel=1024M sh-5.1# sh-5.1# kdumpctl estimate Reserved crashkernel: 512M Recommended crashkernel: 512MKernel image size: 53M Kernel modules size: 23M Initramfs size: 68M Runtime reservation: 64M Large modules: xfs: 2543616 mlx5_core: 2486272 ext4: 1191936 ice: 1241088 kvm: 1351680 sh-5.1#
- relates to
-
OCPBUGS-54520 kdump not generated on the Dell PowerEdge XR11
-
- Closed
-