-
Bug
-
Resolution: Done-Errata
-
Undefined
-
rhel-9.5
-
None
-
gnome-keyring-40.0-4.el9_4
-
None
-
None
-
4
-
rhel-sst-display-desktop-foundation
-
ssg_display
-
None
-
False
-
-
None
-
CentOS Stream, Red Hat Enterprise Linux
-
DESKTOP Cycle #1 10.beta phase, DESKTOP Cycle #3 10.beta phase, DESKTOP Cycle #4 10.beta phase, DESKTOP Cycle #2 10.beta phase
-
Pass
-
RegressionOnly
-
None
Would it be possible to backport the following gnome-keyring patches to RHEL8 and/or 9?
- https://gitlab.gnome.org/GNOME/gnome-keyring/-/merge_requests/65
- https://gitlab.gnome.org/GNOME/gnome-keyring/-/merge_requests/5
- https://gitlab.gnome.org/GNOME/gnome-keyring/-/merge_requests/11
These prevent a deadlock that causes the SSH agent to spin and new OpenSSH client sessions to hang forever. The bug has been reported repeatedly, both to Red Hat (see RHEL-9302 against RHEL7) and upstream (https://gitlab.gnome.org/GNOME/gnome-keyring/-/issues/25, https://gitlab.gnome.org/GNOME/gnome-keyring/-/issues/70, https://bugzilla.gnome.org/show_bug.cgi?id=794848).
Is this something Red Hat would be interested in doing? If provided, would patches against CentOS Stream 8 and/or 9 be accepted?
I've included a copy of our internal bug ticket below, for reference.
One of our users reported that git pushes to GitHub were hanging on their workstation. A look at the state of the workstation showed that the user's gnome-keyring-daemon (gnome-keyring-0:3.28.2-1.el8.x86_64 as shipped in RHEL8) was consuming 100% of a CPU core on that machine; as gnome-keyring-daemon provides SSH agent services, this causes git push operations to hang as follows:
- The user attempts git push to an ssh destination (the most common choice - few users create the access tokens required to use git push to GitHub over SSH);
- The git client launches OpenSSH to talk to the repository server;
- OpenSSH attempts to query the running SSH agent to discover what SSH keys it has available;
- gnome-keyring-daemon's SSH agent responder is stuck and never replies, causing OpenSSH (and git) to hang forever.
A look around other lab machines showed one other runaway gnome-keyring-daemon process belonging to another user, so this isn't solely an isolated incident.
I listed the threads for a stuck gnome-keyring-daemon with top -H -p PID. I attached to the stuck thread (gdb -p TID run as root) and used the bt command to obtain a backtrace:
(gdb) bt #0 0x00007fcc52ef5138 in g_mutex_unlock () from /lib64/libglib-2.0.so.0 #1 0x00007fcc52eadccd in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0 #2 0x00007fcc52eadf40 in g_main_context_iteration () from /lib64/libglib-2.0.so.0 #3 0x000055f32b0971d9 in gkd_ssh_agent_process_connect (self=0x55f32c5ad400, cancellable=0x55f32c5c0610, error=error@entry=0x7fcc4d52a4c8) at daemon/ssh-agent/gkd-ssh-agent-process.c:232 #4 0x000055f32b095a78 in on_run (service=<optimized out>, connection=0x55f32c5ef720, source_object=<optimized out>, user_data=<optimized out>) at daemon/ssh-agent/gkd-ssh-agent-service.c:297 #5 0x00007fcc515ff17e in ffi_call_unix64 () from /lib64/libffi.so.6 #6 0x00007fcc515feb2f in ffi_call () from /lib64/libffi.so.6 #7 0x00007fcc5318b386 in g_cclosure_marshal_generic_va () from /lib64/libgobject-2.0.so.0 #8 0x00007fcc5318a616 in _g_closure_invoke_va () from /lib64/libgobject-2.0.so.0 #9 0x00007fcc531a6525 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 #10 0x00007fcc531a70e3 in g_signal_emit () from /lib64/libgobject-2.0.so.0 #11 0x00007fcc53462ebc in g_threaded_socket_service_func () from /lib64/libgio-2.0.so.0 #12 0x00007fcc52ed6ef3 in g_thread_pool_thread_proxy () from /lib64/libglib-2.0.so.0 #13 0x00007fcc52ed64ea in g_thread_proxy () from /lib64/libglib-2.0.so.0 #14 0x00007fcc51fd51ca in start_thread () from /lib64/libpthread.so.0 #15 0x00007fcc51c41e73 in clone () from /lib64/libc.so.6Looking at other threads of the process revealed one that was doing the following:
#0 0x00007fcc51c419bd in syscall () from /lib64/libc.so.6 #1 0x00007fcc52ef487c in g_mutex_lock_slowpath () from /lib64/libglib-2.0.so.0 #2 0x000055f32b096ec7 in on_child_watch (pid=161639, status=256, user_data=<optimized out>) at daemon/ssh-agent/gkd-ssh-agent-process.c:133 #3 0x00007fcc52eaa418 in g_child_watch_dispatch () from /lib64/libglib-2.0.so.0 #4 0x00007fcc52eadaed in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #5 0x00007fcc52eadea8 in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0 #6 0x00007fcc52eae1d2 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #7 0x000055f32b0719fa in main (argc=<optimized out>, argv=<optimized out>) at daemon/gkd-main.c:1165gkd_ssh_agent_process_connect() in the first thread is running the GLib main loop while holding self->lock, waiting for on_output_watch() to set self->ready. However, if the SSH agent has already exited, on_child_watch() will be executed on a second thread, which tries to take self->lock - that creates a deadlock. The timeout in gkd_ssh_agent_process_connect() seems to be ineffective because it's also triggered by an event and the entire event-handling flow is stuck due to the deadlock.
In this case, it looks like the SSH agent is exiting early because there are two copies of gnome-keyring-daemon running, they both try to spawn an SSH agent listening on the same socket path /run/user/UIDNUM/keyring/.ssh, and the second SSH agent understandably refuses to start:
$ ssh-agent -D -a /run/user/1000/keyring/.ssh unix_listener: cannot bind to path /run/user/1000/keyring/.ssh: Address already in use
- is related to
-
RHEL-9302 [LLNL 7.8 Bug] gnome-keyring-daemon eating 100% CPU
- Planning
- links to
-
RHBA-2024:136310 gnome-keyring update