What were you trying to do that didn't work?
The systemctl processes started by realmd are functioning as intended and talking to systemd to enable the sssd service. The systemctl process then exits after that operation is complete, and realmd seems to completely miss the SIGCHLD signal related to its child systemctl process exiting. This is also the definition of a zombie process, a process that has exited but its parent is unaware of.
Since realmd doesn't realize that the process has exited, it just hangs indefinitely. This is all very rare. When running an strace on systemd, it only seems to happen about 1 in 300 or 400 realm joins. When running an strace on realmd to try to watch the systemctl process and realmd itself, the customer has not been able to reproduce it in about 1400 attempts. I think this indicates some kind of a race condition or something with realmd where it's missing the SIGCHLD related to its systemctl child, and when we interrupt all syscalls for the process, it slows things down enough that it doesn't miss the signal.
Please provide the package NVR for which bug is seen:
- systemd-252-14.el9_2.1.x86_64
- sssd-2.8.2-2.el9.x86_64
- realmd-0.17.1-1.el9.x86_64
How reproducible:
Very difficult, only able to reproduce in 1 out of 300-400 attempts.
Steps to reproduce
- Join domain with realmd.