-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.2.0
-
None
-
None
-
Important
-
sst_kernel_rts
-
ssg_core_kernel
-
5
-
False
-
-
None
-
None
-
None
-
None
-
-
aarch64
-
None
What were you trying to do that didn't work?
Install the OCP 4.15.15 release (which uses the RHEL 9.2 kernel) on an ARM server and use the 64K page size kernel.
Please provide the package NVR for which bug is seen:
stalld-1.17.1-1.el9_1.aarch64
How reproducible:
always
Steps to reproduce
- Install OCP 4.15.15 on an ARM server
- Switch to the 64K page size kernel
Expected results
stalld functions correctly
Actual results
stalld continuously core dumps - for example:
May 30 12:12:35 cnfdg37 systemd[1]: Starting Stall Monitor... May 30 12:12:35 cnfdg37 stalld[10159]: lockdown mode is off May 30 12:12:35 cnfdg37 systemd[1]: Started Stall Monitor. May 30 12:12:35 cnfdg37 stalld[10159]: /sys/kernel/debug/sched/features exists May 30 12:12:35 cnfdg37 stalld[10159]: /sys/kernel/debug/sched/debug exists May 30 12:12:35 cnfdg37 stalld[10159]: boosted pid 0 (undef) using SCHED_DEADLINE May 30 12:12:35 cnfdg37 stalld[10159]: using SCHED_DEADLINE for boosting May 30 12:12:35 cnfdg37 stalld[10159]: initial config_buffer_size set to 14417920 May 30 12:12:35 cnfdg37 stalld[10159]: detected new task format May 30 12:12:35 cnfdg37 stalld[10159]: single threaded mode May 30 12:12:37 cnfdg37 systemd-coredump[10267]: [🡕] Process 10159 (stalld) of user 0 dumped core. May 30 12:12:37 cnfdg37 systemd[1]: stalld.service: Main process exited, code=dumped, status=11/SEGV May 30 12:12:37 cnfdg37 systemd[1]: stalld.service: Failed with result 'core-dump'. May 30 12:12:37 cnfdg37 systemd[1]: stalld.service: Scheduled restart job, restart counter is at 1. May 30 12:12:37 cnfdg37 systemd[1]: Stopped Stall Monitor. May 30 12:12:37 cnfdg37 stalld[10287]: lockdown mode is off May 30 12:12:37 cnfdg37 systemd[1]: Starting Stall Monitor... May 30 12:12:37 cnfdg37 stalld[10287]: /sys/kernel/debug/sched/features exists May 30 12:12:37 cnfdg37 systemd[1]: Started Stall Monitor. May 30 12:12:37 cnfdg37 stalld[10287]: /sys/kernel/debug/sched/debug exists May 30 12:12:37 cnfdg37 stalld[10287]: boosted pid 0 (undef) using SCHED_DEADLINE May 30 12:12:37 cnfdg37 stalld[10287]: using SCHED_DEADLINE for boosting May 30 12:12:37 cnfdg37 stalld[10287]: initial config_buffer_size set to 14417920 May 30 12:12:37 cnfdg37 stalld[10287]: detected new task format May 30 12:12:37 cnfdg37 stalld[10287]: single threaded mode May 30 12:12:37 cnfdg37 systemd-coredump[10289]: [🡕] Process 10287 (stalld) of user 0 dumped core. May 30 12:12:37 cnfdg37 systemd[1]: stalld.service: Main process exited, code=dumped, status=11/SEGV May 30 12:12:37 cnfdg37 systemd[1]: stalld.service: Failed with result 'core-dump'.
Here is a core dump info (full core dump attached):
[core@cnfdg37 ~]$ sudo coredumpctl info 86371 PID: 86371 (stalld) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Wed 2024-06-05 14:07:24 UTC (5min ago) Command Line: /usr/bin/stalld --systemd -p 1000000000 -r 20000 -d 3 -t 20 --foreground --pidfile /run/stalld.pid Executable: /usr/bin/stalld Control Group: /system.slice/stalld.service Unit: stalld.service Slice: system.slice Boot ID: 4fd9097c2ea54270b2d0c718e11fc3f7 Machine ID: 2600e9ed13c34ffcbfb802a8edab6275 Hostname: cnfdg37 Storage: /var/lib/systemd/coredump/core.stalld.0.4fd9097c2ea54270b2d0c718e11fc3f7.86371.1717596444000000.zst (present) Size on Disk: 137.7K Message: Process 86371 (stalld) of user 0 dumped core. Stack trace of thread 86371: #0 0x0000aaaacbe254a8 get_cpu_busy_list (stalld + 0x54a8) #1 0x0000aaaacbe237c4 main (stalld + 0x37c4) #2 0x0000ffff93d4c79c __libc_start_call_main (libc.so.6 + 0x2c79c) #3 0x0000ffff93d4c86c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2c86c) #4 0x0000aaaacbe23c70 _start (stalld + 0x3c70) ELF object binary architecture: AARCH64
System information:
[core@cnfdg37 ~]$ uname -a Linux cnfdg37 5.14.0-284.67.1.el9_2.aarch64+64k #1 SMP PREEMPT_DYNAMIC Mon May 13 15:24:28 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux [core@cnfdg37 ~]$ cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-4e20df033d8f1de063c19b1faf96533e75b4703c42c8ec06b92d6db6436f2004/vmlinuz-5.14.0-284.67.1.el9_2.aarch64+64k ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/4e20df033d8f1de063c19b1faf96533e75b4703c42c8ec06b92d6db6436f2004/0 root=UUID=bc8ec181-a943-4a79-86c2-86da64ad24e8 rw rootflags=prjquota boot=UUID=2430289c-848b-4a1c-8a24-805015977c9d skew_tick=1 rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-127 tuned.non_isolcpus=00000003 systemd.cpu_affinity=0,1 isolcpus=managed_irq,2-127 nohz_full=2-127 nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 rcupdate.rcu_normal_after_boot=0 efi=runtime module_blacklist=irdma vfio_pci.disable_idle_d3=1 vfio_pci.enable_sriov=1 iommu.passthrough=1 default_hugepagesz=512M hugepagesz=512M hugepages=32 systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 intel_iommu=on iommu=pt