-
Sub-task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
0
-
False
-
-
False
-
openvswitch3.4-3.4.3-84.el9fdp
-
rhel-9
-
rhel-net-ovs-dpdk
-
-
-
ssg_networking
Problem Description: Clearly explain the issue.
There is an ABBA deadlock between time_init() and seq_wait() in OVS:
- Thread 1:
- poll_block()
- time_poll()
- time_init()
- pthread_once() <-- lock A
- do_time_init()
- seq_create()
- pthread_mutex_lock(seq_mutex) <-- lock B
- Thread 2:
- seq_wait(different seqno)
- pthread_mutex_lock(seq_mutex) <-- lock B
- poll_immediate_wake()
- poll_timer_wait()
- time_msec()
- time_init()
- pthread_once() <-- lock A
I believe, this is the same deadlock upstream Intel CI saw a few years ago.
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
This deadlock should normally never happen in real ovs-vswitchd, since time module will be initialized long before other threads are created. But it happens sometimes in the test-barrier unit test application when running on a high-core count system with high testsuite parallelism, e.g. make check TESTSUITEFLAGS='-j40'.
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
It's a race, hard to reproduce consistently.
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
run make check TESTSUITEFLAGS='-j40' on a high-core count system.
Expected Behavior: Describe what should happen under normal circumstances.
No deadlock.
Observed Behavior: Explain what actually happens.
The testsuite hangs as test-barrier application is stuck.
Full stacktrace:
Using host libthread_db library "/lib64/libthread_db.so.1". futex_wait (private=0, expected=2, futex_word=0x680940 <seq_mutex>) at ../sysdeps/nptl/futex-internal.h:146 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.192-5.el9.x86_64 libatomic-11.5.0-5.el9_5.x86_64 libbpf-1.5.0-1.el9.x86_64 libcap-2.48-9.el9_2.x86_64 libgcc-11.5.0-5.el9_5.x86_64 libgcrypt-1.10.0-11.el9.x86_64 libgpg-error-1.42-5.el9.x86_64 libzstd-1.5.5-1.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 numactl-libs-2.0.19-1.el9.x86_64 python3-libs-3.9.21-2.el9.x86_64 systemd-libs-252-51.el9.x86_64 unbound-libs-1.16.2-17.el9.x86_64 xz-libs-5.2.5-8.el9_0.x86_64 (gdb) info threads Id Target Id Frame * 1 Thread 0x7f6272216ac0 (LWP 309453) "ovstest" futex_wait (private=0, expected=2, futex_word=0x680940 <seq_mutex>) at ../sysdeps/nptl/futex-internal.h:146 2 Thread 0x7f627192e640 (LWP 309480) "ovs-barrier2" futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 3 Thread 0x7f627112d640 (LWP 309481) "ovs-barrier0" futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 4 Thread 0x7f627092c640 (LWP 309482) "ovs-barrier1" futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 5 Thread 0x7f6267fff640 (LWP 309484) "ovs-barrier4" futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 (gdb) bt #0 futex_wait (private=0, expected=2, futex_word=0x680940 <seq_mutex>) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait (futex=futex@entry=0x680940 <seq_mutex>, private=0) at lowlevellock.c:50 #2 0x00007f627208d4d2 in lll_mutex_lock_optimized (mutex=0x680940 <seq_mutex>) at pthread_mutex_lock.c:49 #3 ___pthread_mutex_lock (mutex=mutex@entry=0x680940 <seq_mutex>) at pthread_mutex_lock.c:94 #4 0x00000000004c5df9 in ovs_mutex_lock_at (l_=l_@entry=0x680940 <seq_mutex>, where=where@entry=0x5d576d "lib/seq.c:84") at lib/ovs-thread.c:77 #5 0x00000000004d6a7c in seq_create () at lib/seq.c:84 #6 0x00000000004e20a1 in do_init_time () at lib/timeval.c:132 #7 0x00007f627208f2f8 in __pthread_once_slow (once_control=0x693c64 <once>, init_routine=0x4e2090 <do_init_time>) at pthread_once.c:117 #8 0x00007f627208f3e5 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>) at pthread_once.c:144 #9 0x00000000004e24c3 in time_init () at lib/timeval.c:145 #10 time_poll (pollfds=pollfds@entry=0x1b6f2cb0, n_pollfds=2, handles=handles@entry=0x0, timeout_when=9223372036854775807, elapsed=elapsed@entry=0x7ffd7a355e5c) at lib/timeval.c:296 #11 0x00000000004cdbf8 in poll_block () at ./include/openvswitch/hmap.h:264 #12 0x00000000004c6c56 in ovs_barrier_block (barrier=<optimized out>) at lib/ovs-thread.c:386 #13 0x00000000004148bb in basic_block_check (expected=<optimized out>, n=<optimized out>, aux=<optimized out>) at tests/test-barrier.c:71 #14 test_barrier_basic (n_threads=<optimized out>) at tests/test-barrier.c:113 #15 test_barrier (argv=<optimized out>, argc=<optimized out>) at tests/test-barrier.c:260 #16 ovstest_wrapper_test_barrier__ (ctx=<optimized out>) at tests/test-barrier.c:264 #17 0x000000000043fb7e in ovs_cmdl_run_command__ (ctx=ctx@entry=0x7ffd7a355f50, commands=<optimized out>, read_only=read_only@entry=false) at lib/command-line.c:247 #18 0x00000000004401c7 in ovs_cmdl_run_command (ctx=ctx@entry=0x7ffd7a355f50, commands=<optimized out>) at lib/command-line.c:278 #19 0x0000000000407d83 in main (argc=2, argv=0x7ffd7a3560a8) at tests/ovstest.c:133 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f627192e640 (LWP 309480))] #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); (gdb) bt #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 #1 futex_wait_simple (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:177 #2 __pthread_once_slow (once_control=0x693c64 <once>, init_routine=0x4e2090 <do_init_time>) at pthread_once.c:106 #3 0x00007f627208f3e5 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>) at pthread_once.c:144 #4 0x00000000004e21a3 in time_init () at lib/timeval.c:145 #5 time_timespec__ (c=0x693d20 <monotonic_clock>, ts=0x7f627192d9f0) at lib/timeval.c:153 #6 0x00000000004e22d1 in time_msec__ (c=0x693d20 <monotonic_clock>) at lib/timeval.c:218 #7 time_msec () at lib/timeval.c:226 #8 0x00000000004cdb09 in poll_timer_wait_at (where=0x5d2a5e "lib/ovs-thread.c:385", msec=0) at lib/poll-loop.c:184 #9 poll_immediate_wake_at (where=where@entry=0x5d2a5e "lib/ovs-thread.c:385") at lib/poll-loop.c:232 #10 0x00000000004d6be6 in seq_wait__ (where=<optimized out>, value=<optimized out>, seq=<optimized out>) at lib/seq.c:179 #11 seq_wait_at (seq_=0x1b6f2360, value=value@entry=1, where=where@entry=0x5d2a5e "lib/ovs-thread.c:385") at lib/seq.c:220 #12 0x00000000004c6c51 in ovs_barrier_block (barrier=<optimized out>) at lib/ovs-thread.c:385 #13 0x00000000004146ff in basic_blocker_main (aux_=0x1b6f2390) at tests/test-barrier.c:47 #14 0x00000000004c5db5 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:429 #15 0x00007f627208a0ea in start_thread (arg=<optimized out>) at pthread_create.c:443 #16 0x00007f627210f150 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f627112d640 (LWP 309481))] #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); (gdb) bt #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 #1 futex_wait_simple (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:177 #2 __pthread_once_slow (once_control=0x693c64 <once>, init_routine=0x4e2090 <do_init_time>) at pthread_once.c:106 #3 0x00007f627208f3e5 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>) at pthread_once.c:144 #4 0x00000000004e24c3 in time_init () at lib/timeval.c:145 #5 time_poll (pollfds=pollfds@entry=0x7f6268000b60, n_pollfds=2, handles=handles@entry=0x0, timeout_when=9223372036854775807, elapsed=elapsed@entry=0x7f627112ca2c) at lib/timeval.c:296 #6 0x00000000004cdbf8 in poll_block () at ./include/openvswitch/hmap.h:264 #7 0x00000000004c6c56 in ovs_barrier_block (barrier=<optimized out>) at lib/ovs-thread.c:386 #8 0x00000000004146ff in basic_blocker_main (aux_=0x1b6f239c) at tests/test-barrier.c:47 #9 0x00000000004c5db5 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:429 #10 0x00007f627208a0ea in start_thread (arg=<optimized out>) at pthread_create.c:443 #11 0x00007f627210f150 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 4 [Switching to thread 4 (Thread 0x7f627092c640 (LWP 309482))] #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); (gdb) bt #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 #1 futex_wait_simple (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:177 #2 __pthread_once_slow (once_control=0x693c64 <once>, init_routine=0x4e2090 <do_init_time>) at pthread_once.c:106 #3 0x00007f627208f3e5 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>) at pthread_once.c:144 #4 0x00000000004e24c3 in time_init () at lib/timeval.c:145 #5 time_poll (pollfds=pollfds@entry=0x7f626c000b60, n_pollfds=2, handles=handles@entry=0x0, timeout_when=9223372036854775807, elapsed=elapsed@entry=0x7f627092ba2c) at lib/timeval.c:296 #6 0x00000000004cdbf8 in poll_block () at ./include/openvswitch/hmap.h:264 #7 0x00000000004c6c56 in ovs_barrier_block (barrier=<optimized out>) at lib/ovs-thread.c:386 #8 0x00000000004146ff in basic_blocker_main (aux_=0x1b6f23a8) at tests/test-barrier.c:47 #9 0x00000000004c5db5 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:429 #10 0x00007f627208a0ea in start_thread (arg=<optimized out>) at pthread_create.c:443 #11 0x00007f627210f150 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 5 [Switching to thread 5 (Thread 0x7f6267fff640 (LWP 309484))] #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); (gdb) bt #0 futex_wait (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:146 #1 futex_wait_simple (private=0, expected=1, futex_word=0x693c64 <once>) at ../sysdeps/nptl/futex-internal.h:177 #2 __pthread_once_slow (once_control=0x693c64 <once>, init_routine=0x4e2090 <do_init_time>) at pthread_once.c:106 #3 0x00007f627208f3e5 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>) at pthread_once.c:144 #4 0x00000000004e24c3 in time_init () at lib/timeval.c:145 #5 time_poll (pollfds=pollfds@entry=0x7f6258000b60, n_pollfds=2, handles=handles@entry=0x0, timeout_when=9223372036854775807, elapsed=elapsed@entry=0x7f6267ffea2c) at lib/timeval.c:296 #6 0x00000000004cdbf8 in poll_block () at ./include/openvswitch/hmap.h:264 #7 0x00000000004c6c56 in ovs_barrier_block (barrier=<optimized out>) at lib/ovs-thread.c:386 #8 0x000000000041470d in basic_blocker_main (aux_=0x1b6f23b4) at tests/test-barrier.c:49 #9 0x00000000004c5db5 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:429 #10 0x00007f627208a0ea in start_thread (arg=<optimized out>) at pthread_create.c:443 #11 0x00007f627210f150 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81