-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-9.6
-
None
-
No
-
Important
-
6
-
rhel-virt-cloud
-
ssg_virtualization
-
0
-
False
-
False
-
-
None
-
RHELOPC Sprint 49, RHELOPC Sprint 50, Virt-Cloud-Core Sprint 51, Virt-Cloud-Core Sprint 52, Virt-Cloud-Core Sprint 53, Virt-Cloud-Core Parking Lot
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
x86_64
-
None
What were you trying to do that didn't work?
In a KVM guest we are seeing user-space threads getting hung indefinitely on async page faults:
Jun 03 19:45:03 kernel: INFO: task elasticsearch[e:7030 blocked for more than 1228 seconds. Jun 03 19:45:03 kernel: Not tainted 5.14.0-570.19.1.el9_6.x86_64 #1 Jun 03 19:45:03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 03 19:45:03 kernel: task:elasticsearch[e state:D stack:0 pid:7030 tgid:2707 ppid:1340 flags:0x00000002 Jun 03 19:45:03 kernel: Call Trace: Jun 03 19:45:03 kernel: <TASK> Jun 03 19:45:03 kernel: __schedule+0x229/0x4a0 Jun 03 19:45:03 kernel: schedule+0x2e/0xb0 Jun 03 19:45:03 kernel: kvm_async_pf_task_wait_schedule+0xf3/0x180 Jun 03 19:45:03 kernel: ? __count_memcg_events+0x4f/0xb0 Jun 03 19:45:03 kernel: __kvm_handle_async_pf+0x53/0xb0 Jun 03 19:45:03 kernel: exc_page_fault+0x7d/0x150 Jun 03 19:45:03 kernel: asm_exc_page_fault+0x22/0x30 Jun 03 19:45:03 kernel: RIP: 0033:0x7faaafc94fa0 Jun 03 19:45:03 kernel: RSP: 002b:00007fa8e820f340 EFLAGS: 00010216 Jun 03 19:45:03 kernel: RAX: 00007e3ffac80000 RBX: 0000000039a00000 RCX: 0000000000005400 Jun 03 19:45:03 kernel: RDX: 0000000000010000 RSI: 00000003cd542e80 RDI: 00007e3ffac00000 Jun 03 19:45:03 kernel: RBP: 00007fa8e820f340 R08: 000000000000abe8 R09: 00007e3ffac00000 Jun 03 19:45:03 kernel: R10: 00007faaafc97a00 R11: 0000000000000000 R12: 0000000000000000 Jun 03 19:45:03 kernel: R13: 00007faaa1000000 R14: 00000003cd542e70 R15: 00007fa868001dc0 Jun 03 19:45:03 kernel: </TASK>
What is the impact of this issue to you?
Huge impact as this can affect any process.
Please provide the package NVR for which the bug is seen:
Both the host and the guest are RHEL 9.6 with kernel-5.14.0-570.19.1.el9_6.x86_64
Also seen in a guest with kernel-5.14.0-503.35.1.el9_5.x86_64
How reproducible is this bug?:
Always in customer's environment when under high load. Not reproduced locally.
Steps to reproduce
Expected results
No hung threads.
Actual results
Hung threads on async page faults.