-
Bug
-
Resolution: Unresolved
-
Major
-
rhel-9.6
-
None
-
No
-
Important
-
6
-
rhel-virt-cloud-core
-
ssg_virtualization
-
17
-
20
-
0
-
False
-
False
-
-
None
-
RHELOPC Sprint 49, RHELOPC Sprint 50, Virt-Cloud-Core Sprint 51, Virt-Cloud-Core Sprint 52, Virt-Cloud-Core Sprint 53, Virt-Cloud-Core Parking Lot
-
Requested
-
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
x86_64
-
None
-
Merge Request passes all submitter checks, Merge Request finished CI testing, Merge Request passed CI testing, Merge Request approved by peer review
What were you trying to do that didn't work?
In a KVM guest we are seeing user-space threads getting hung indefinitely on async page faults:
Jun 03 19:45:03 kernel: INFO: task elasticsearch[e:7030 blocked for more than 1228 seconds. Jun 03 19:45:03 kernel: Not tainted 5.14.0-570.19.1.el9_6.x86_64 #1 Jun 03 19:45:03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 03 19:45:03 kernel: task:elasticsearch[e state:D stack:0 pid:7030 tgid:2707 ppid:1340 flags:0x00000002 Jun 03 19:45:03 kernel: Call Trace: Jun 03 19:45:03 kernel: <TASK> Jun 03 19:45:03 kernel: __schedule+0x229/0x4a0 Jun 03 19:45:03 kernel: schedule+0x2e/0xb0 Jun 03 19:45:03 kernel: kvm_async_pf_task_wait_schedule+0xf3/0x180 Jun 03 19:45:03 kernel: ? __count_memcg_events+0x4f/0xb0 Jun 03 19:45:03 kernel: __kvm_handle_async_pf+0x53/0xb0 Jun 03 19:45:03 kernel: exc_page_fault+0x7d/0x150 Jun 03 19:45:03 kernel: asm_exc_page_fault+0x22/0x30 Jun 03 19:45:03 kernel: RIP: 0033:0x7faaafc94fa0 Jun 03 19:45:03 kernel: RSP: 002b:00007fa8e820f340 EFLAGS: 00010216 Jun 03 19:45:03 kernel: RAX: 00007e3ffac80000 RBX: 0000000039a00000 RCX: 0000000000005400 Jun 03 19:45:03 kernel: RDX: 0000000000010000 RSI: 00000003cd542e80 RDI: 00007e3ffac00000 Jun 03 19:45:03 kernel: RBP: 00007fa8e820f340 R08: 000000000000abe8 R09: 00007e3ffac00000 Jun 03 19:45:03 kernel: R10: 00007faaafc97a00 R11: 0000000000000000 R12: 0000000000000000 Jun 03 19:45:03 kernel: R13: 00007faaa1000000 R14: 00000003cd542e70 R15: 00007fa868001dc0 Jun 03 19:45:03 kernel: </TASK>
What is the impact of this issue to you?
Huge impact as this can affect any process.
Please provide the package NVR for which the bug is seen:
Both the host and the guest are RHEL 9.6 with kernel-5.14.0-570.19.1.el9_6.x86_64
Also seen in a guest with kernel-5.14.0-503.35.1.el9_5.x86_64
How reproducible is this bug?:
Always in customer's environment when under high load. Not reproduced locally.
Steps to reproduce
Expected results
No hung threads.
Actual results
Hung threads on async page faults.