Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.z, 4.11.z
Component/s: RHCOS
Labels:
- UpgradeRecommendationBlocked
- regression

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
Yes

Target Backport Versions:
None
Target Version:

4.11.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Just upgraded to 4.12.50 and is seeing this sporadically on 60 clusters.

kernel: 4.18.0-372.89.1.el8_6.x86_64

Load averages are really high and IO wait it ~15%.

[650036.617508] INFO: task ose_diag:4092402 blocked for more than 120 seconds.
[650036.617655]       Tainted: G               X --------- -  - 4.18.0-372.89.1.el8_6.x86_64 #1
[650036.617827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[650036.617979] task:ose_diag state:D stack:    0 pid:4092402 ppid:4092368 flags:0x10000184
[650036.617982] Call Trace:
[650036.617986]  __schedule+0x2d1/0x860
[650036.617991]  schedule+0x55/0xf0
[650036.617993]  io_schedule+0x12/0x40
[650036.617994]  migration_entry_wait_on_locked+0x1e0/0x280
[650036.617997]  ? filemap_fdatawait_keep_errors+0x50/0x50
[650036.617999]  do_swap_page+0x5b0/0x710
[650036.618002]  ? pmd_devmap_trans_unstable+0x2e/0x40
[650036.618003]  ? handle_pte_fault+0x5d/0x880
[650036.618004]  __handle_mm_fault+0x453/0x6d0
[650036.618007]  handle_mm_fault+0xca/0x2a0
[650036.618008]  __do_page_fault+0x1d0/0x420
[650036.618011]  do_page_fault+0x37/0x12d
[650036.618013]  ? page_fault+0x8/0x30
[650036.618015]  page_fault+0x1e/0x30
[650036.618017] RIP: 0033:0x43833c

Version-Release number of selected component (if applicable):

4.12.50

How reproducible:

Since the upgrade

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Probes are failing on timeouts and pod gets stuck terminating

Expected results:

Additional info:

clones

OCPBUGS-30096 [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating

Closed

is blocked by

COS-2705 Impact assesment for OCPBUGS-30096: [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating

Closed

links to

RHEL 8.8/8.6(EUS): hung_task_timeout_secs at migration_entry_wait_on_locked

Assignee:: Michael Nguyen

Reporter:: Matt Robson

Need Info From:: None

Contributors:: None

QA Contact:: Michael Nguyen

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/03/05 7:41 PM

Updated:: 2025/08/02 3:28 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates