Description
When terminating the keycloak process (started using the standalone.sh script from a systemd service command), the following line is repeated in a loop (for a few seconds until the script is killed with a signal that is not trapped - e.g., SIGKILL), slowing the service's restart and filling the logs:
*** JBossAS process (533806) received TERM signal *** *** JBossAS process (533806) received TERM signal *** *** JBossAS process (533806) received TERM signal *** *** JBossAS process (533806) received TERM signal ***
This happens only when the LAUNCH_JBOSS_IN_BACKGROUND variable is enabled, since the script will launch the java process in the background and use the wait command in a loop to wait for the process to end.
On Centos Stream 8, bash is the default POSIX compliant shell when running commands using sh. In bash's case the wait builtin command does not appear to reset the exit status of the waited process after it exits (in other words, the exit information of a zombie process is not consumed). Because of this, the following sequence will run in an endless loop when the script receives the TERM signal:
while [ "$WAIT_STATUS" -ge 128 ]; do wait $JBOSS_PID 2>/dev/null WAIT_STATUS=$? if [ "$WAIT_STATUS" -gt 128 ]; then SIGNAL=`expr $WAIT_STATUS - 128` SIGNAL_NAME=`kill -l $SIGNAL` echo "*** JBossAS process ($JBOSS_PID) received $SIGNAL_NAME signal ***" >&2 fi done
You can test how the wait command behaves using the following commands:
sleep 10000000 & PID=$! kill ${PID} # the process is killed here, but bash can still retrieve information about it wait ${PID} # will return the 143 error code (128 + 15) (15 = SIGTERM) echo $? # will display 143 wait ${PID} # a second call should probably return 127, but will continue returning 143 because the zombie process is not cleared echo $? wait # wait without parameters returns 0, but clears the information about the zombie wait ${PID} # will return 127 and print an error, since the zombie has been removed
This issue probably affects other systems, like WildFly Core (where the standalone.sh script appears to originate - https://github.com/wildfly/wildfly-core/blob/main/core-feature-pack/common/src/main/resources/content/bin/standalone.sh#L373), and Infinispan (has a similarly built server script https://github.com/infinispan/infinispan/blob/main/server/runtime/src/main/server/bin/server.sh#L32).
An approach to fix this issue may be adding a check to see if the process is still running, and if it is not running break out of the loop:
while [ "$WAIT_STATUS" -ge 128 ]; do wait $JBOSS_PID 2>/dev/null WAIT_STATUS=$? if [ "$WAIT_STATUS" -gt 128 ]; then SIGNAL=`expr $WAIT_STATUS - 128` SIGNAL_NAME=`kill -l $SIGNAL` echo "*** JBossAS process ($JBOSS_PID) received $SIGNAL_NAME signal ***" >&2 kill -0 $JBOSS_PID 2> /dev/null || break fi done