Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-28749

pcsd processes are not terminated with SIGTERM [rhel-9]

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhel-9.5
    • rhel-9.3.0
    • pcs
    • pcs-0.11.7-3.el9
    • None
    • Important
    • ZStream
    • rhel-sst-high-availability
    • ssg_filesystems_storage_and_HA
    • 13
    • 26
    • 8
    • False
    • Hide

      None

      Show
      None
    • Yes
    • Red Hat Enterprise Linux
    • None
    • Approved Blocker
    • Bug Fix
    • Hide
      .`pcsd` processes now consistently stop correctly and promptly

      Previously, the creation method for `pcsd` processes sometimes caused a deadlock during process termination. The processes were then terminated only after a `systemd` timeout. This fix changes the process creation method and there is no longer a deadlock when the processes are stopped. As a result, `pcsd` consistently stops correctly within a short time.
      Show
      .`pcsd` processes now consistently stop correctly and promptly Previously, the creation method for `pcsd` processes sometimes caused a deadlock during process termination. The processes were then terminated only after a `systemd` timeout. This fix changes the process creation method and there is no longer a deadlock when the processes are stopped. As a result, `pcsd` consistently stops correctly within a short time.
    • Done
    • None

      What were you trying to do that didn't work?

      systemctl stop pcsd takes 90 seconds, and the following messages are shown:

      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: State 'stop-sigterm' timed out. Killing.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4423 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4426 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4440 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4427 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4435 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4468 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4484 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Killing process 4486 (pcsd) with signal SIGKILL.
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Main process exited, code=killed, status=9/KILL
      Feb 27 11:58:24 node1 systemd[1]: pcsd.service: Failed with result 'timeout'.

      When stopping pcsd.service, some pcsd processes are not terminated as follows.

      1. ps -ef
        UID PID PPID C STIME TTY TIME CMD
        [...]
        root 1848 1 0 14:43 ? 00:00:50 /usr/bin/python3 -Es /usr/sbin/pcsd
        root 2565 1848 0 14:43 ? 00:00:15 /usr/bin/python3 -Es /usr/sbin/pcsd
        root 2578 1848 0 14:43 ? 00:00:00 /usr/bin/python3 -Es /usr/sbin/pcsd
        root 2580 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2581 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2583 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2584 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2585 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2588 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        root 2590 1848 0 14:43 ? 00:00:00 [pcsd] <defunct>
        [...]
        root 9652 4487 0 17:24 pts/0 00:00:00 systemctl stop pcsd

      Please provide the package NVR for which bug is seen:

      pcs-0.11.4-6.el9 or later

      How reproducible:

      randomly - the phenomenon doesn't emerge each time when stopping pcsd

      Steps to reproduce

      According to user the issue can be reproducible with the following command (however RH support team didn't manage to reproduce this way with same pcs version):

      1. while true; do date; time systemctl stop pcsd; systemctl start pcsd; echo; sleep 10; done

      Expected results

      pcsd stops successfully including it internal processes

      Actual results

              mmazoure Michal Mazourek
              rhn-support-pzimek1 Pepa Zimek
              Pepa Zimek
              Miroslav Lisik Miroslav Lisik
              Michal Mazourek Michal Mazourek
              Steven Levine Steven Levine
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: