Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30546

SNO (RT kernel) watchdog of DU application crash the apps : thread blocked 4ms in pthread_mutex_lock

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.12.z
    • Containers
    • None
    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      L2 application voluntary crashing by there internal in apps watchdog if a thread does not reply in 4ms.
      In this specific case, the app crash and 2 thread are fighting on : pthread_mutex_unlock/pthread_mutex_lock.
      
      Here a quick look at the 0030-L2-BackTrace-XXX-YYY.txt file showing the gdb backtrace focusing on thread 13 and 14 : 
      
      
      Thread 14 (Thread 0x7fb922ffd700 (LWP 16321)):
      #0  0x00007fba1aa845ea in __lll_unlock_wake () from /lib64/libpthread.so.0
      #1  0x00007fba1aa80f9e in _L_unlock_738 () from /lib64/libpthread.so.0
      #2  0x00007fba1aa80f10 in pthread_mutex_unlock () from /lib64/libpthread.so.0
      #3  0x00000000012b8b5c in rgSCHCmnCnsldtSfAlloc (cell=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_sch_cmn.c:21039
      #4  0x000000000137c5cd in rgSchTomTtiCnsldtSfAlloc (cell=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_sch_tom.c:15816
      #5  0x00000000012525f1 in rgBbuPoolingSchExecHdlr (pulse=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_bbu_pooling.c:1950
      #6  0x000000000165c962 in bbupool_rt_thread(void*) ()
      #7  0x00007fba1aa7dea5 in start_thread () from /lib64/libpthread.so.0
      #8  0x00007fba1963db0d in gnu_dev_makedev () from /lib64/libc.so.6
      #9  0x0000000000000000 in ?? ()
      
      
      Thread 13 (Thread 0x7fb921ffb700 (LWP 16323)):
      #0  0x00007fba1aa8454d in __lll_lock_wait () from /lib64/libpthread.so.0
      #1  0x00007fba1aa7fe9b in _L_lock_883 () from /lib64/libpthread.so.0
      #2  0x00007fba1aa7fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3  0x0000000001393f60 in rgSCHUtlAcquireUeLstAndHqPLstLock (wasLockObtained=<optimized out>) at ../ltemac/rg_sch_utl.c:4367
      #4  rgSCHUtlFillRgInfUeInfo (sf=0x1ec68d8 <g_ueLstAndHqPLstLock>, cell=0x80, dlDrxInactvTmrLst=0x0, dlInActvLst=0x7fba1aa8454d <__lll_lock_wait+29>, ulInActvLst=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_sch_utl.c:8126
      #5  0x00000000012b8b5c in rgSCHCmnCnsldtSfAlloc (cell=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_sch_cmn.c:21039
      #6  0x000000000137c5cd in rgSchTomTtiCnsldtSfAlloc (cell=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_sch_tom.c:15816
      #7  0x00000000012525f1 in rgBbuPoolingSchExecHdlr (pulse=0x1ec68d8 <g_ueLstAndHqPLstLock>) at ../ltemac/rg_bbu_pooling.c:1950
      #8  0x000000000165c962 in bbupool_rt_thread(void*) ()
      #9  0x00007fba1aa7dea5 in start_thread () from /lib64/libpthread.so.0
      #10 0x00007fba1963db0d in clone () from /lib64/libc.so.6
      
      
      
      

      Version-Release number of selected component (if applicable):

      OCP 4.12.30
      DU Application build using base container : 
      ubi7-minimal-7.9-839, rhel-ubi7-minimal-aws:v2.9, PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"

      How reproducible:

      The issue is reproduced frequently, we don't have yet the current frenquency of the crash

      Steps to Reproduce:

      Launch the application and wait 

      Actual results:

      Application crash sometimes

      Expected results:

      Application does not crash sometimes

      Additional info:

      We will grab the frequency of the crash.
      We have ask customer to build the application using UBI8, they are investigating the possibility to do it.

              tsweeney@redhat.com Tom Sweeney
              rhn-support-jpeyrard Johann Peyrard
              David Darrah David Darrah (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: