-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.12.z, 4.11.z
Description of problem:
Just upgraded to 4.12.50 and is seeing this sporadically on 60 clusters.
kernel: 4.18.0-372.89.1.el8_6.x86_64
Load averages are really high and IO wait it ~15%.
[650036.617508] INFO: task ose_diag:4092402 blocked for more than 120 seconds. [650036.617655] Tainted: G X --------- - - 4.18.0-372.89.1.el8_6.x86_64 #1 [650036.617827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [650036.617979] task:ose_diag state:D stack: 0 pid:4092402 ppid:4092368 flags:0x10000184 [650036.617982] Call Trace: [650036.617986] __schedule+0x2d1/0x860 [650036.617991] schedule+0x55/0xf0 [650036.617993] io_schedule+0x12/0x40 [650036.617994] migration_entry_wait_on_locked+0x1e0/0x280 [650036.617997] ? filemap_fdatawait_keep_errors+0x50/0x50 [650036.617999] do_swap_page+0x5b0/0x710 [650036.618002] ? pmd_devmap_trans_unstable+0x2e/0x40 [650036.618003] ? handle_pte_fault+0x5d/0x880 [650036.618004] __handle_mm_fault+0x453/0x6d0 [650036.618007] handle_mm_fault+0xca/0x2a0 [650036.618008] __do_page_fault+0x1d0/0x420 [650036.618011] do_page_fault+0x37/0x12d [650036.618013] ? page_fault+0x8/0x30 [650036.618015] page_fault+0x1e/0x30 [650036.618017] RIP: 0033:0x43833c
Version-Release number of selected component (if applicable):
4.12.50
How reproducible:
Since the upgrade
Steps to Reproduce:
1. 2. 3.
Actual results:
Probes are failing on timeouts and pod gets stuck terminating
Expected results:
Additional info:
- clones
-
OCPBUGS-30096 [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating
- Closed
- is blocked by
-
COS-2705 Impact assesment for OCPBUGS-30096: [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating
- Closed
- links to