-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-9.5
-
None
-
None
-
Moderate
-
sst_cs_plumbers
-
ssg_core_services
-
5
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
None
User has a shell script in the pattern:
tail -n0 -F /var/log/messages | stdbuf -i0 -o0 grep -o ... | stdbuf -i0 -o0 awk ... | stdbuf -i0 -o0 awk ... | stdbuf -i0 -o0 xargs -n1 /path/to/handler
A sample reproducer is:
term1# xargs -n1 echo
test
test
foo
foo
term2# ps ax | grep Z
PID pts/0 Z+ 0:00 [echo] <defunct>
We understand the root cause is because xargs has the code:
/* Make sure to listen for the kids. */ signal (SIGCHLD, SIG_DFL);
and this means it does not synchronously handle SIGCHLD.
There is also the code:
/* Before forking, reap any already-exited child. We do this so that we don't leave unreaped children around while we build a new command line. For example this command will spend most of its time waiting for sufficient arguments to launch another command line: seq 1 1000 | fmt | while read x ; do echo $x; sleep 1 ; done | ./xargs -P 200 -n 20 sh -c 'echo "$@"; sleep $((1 + $RANDOM % 5))' sleeper */ wait_for_proc (false, 0u);
It is understood that changing this might require a significant rework of how xargs work.
The reproducer should be a very uncommon usage, as input usually should be very fast, or xargs not used in the way in the user script, and instead use some shell loop construct.
Still, this might be considered a valid usage of xargs, and the process in zombie state looks as if something wrong is happening.