Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30096

[4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating

XMLWordPrintable

    • Yes
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Just upgraded to 4.12.50 and is seeing this sporadically on 60 clusters.

      kernel: 4.18.0-372.89.1.el8_6.x86_64

      Load averages are really high and IO wait it ~15%.

      [650036.617508] INFO: task ose_diag:4092402 blocked for more than 120 seconds.
      [650036.617655]       Tainted: G               X --------- -  - 4.18.0-372.89.1.el8_6.x86_64 #1
      [650036.617827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [650036.617979] task:ose_diag state:D stack:    0 pid:4092402 ppid:4092368 flags:0x10000184
      [650036.617982] Call Trace:
      [650036.617986]  __schedule+0x2d1/0x860
      [650036.617991]  schedule+0x55/0xf0
      [650036.617993]  io_schedule+0x12/0x40
      [650036.617994]  migration_entry_wait_on_locked+0x1e0/0x280
      [650036.617997]  ? filemap_fdatawait_keep_errors+0x50/0x50
      [650036.617999]  do_swap_page+0x5b0/0x710
      [650036.618002]  ? pmd_devmap_trans_unstable+0x2e/0x40
      [650036.618003]  ? handle_pte_fault+0x5d/0x880
      [650036.618004]  __handle_mm_fault+0x453/0x6d0
      [650036.618007]  handle_mm_fault+0xca/0x2a0
      [650036.618008]  __do_page_fault+0x1d0/0x420
      [650036.618011]  do_page_fault+0x37/0x12d
      [650036.618013]  ? page_fault+0x8/0x30
      [650036.618015]  page_fault+0x1e/0x30
      [650036.618017] RIP: 0033:0x43833c     

      Version-Release number of selected component (if applicable):

      4.12.50    

      How reproducible:

      Since the upgrade    

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      Probes are failing on timeouts and pod gets stuck terminating    

      Expected results:

          

      Additional info:

          

            mnguyen@redhat.com Michael Nguyen
            rhn-support-mrobson Matt Robson
            Huijing Hei Huijing Hei
            Votes:
            7 Vote for this issue
            Watchers:
            50 Start watching this issue

              Created:
              Updated:
              Resolved: