Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-37748

ssh with multiplexing can fail to connect to remote system when connection just timed out

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • rhel-9.5
    • rhel-9.4
    • openssh
    • openssh-8.7p1-41.el9
    • None
    • Moderate
    • 1
    • rhel-sst-security-crypto
    • ssg_security
    • 20
    • 0.5
    • False
    • Hide

      None

      Show
      None
    • No
    • Red Hat Enterprise Linux
    • Crypto24Q2
    • Hide

      AC: 1) Sanity only verification that the patch is in compose, applied

      AC: 2) Manual testing thant ssh connection does not fail with exit code 141 when the connection is performed via multiplexing

      Show
      AC: 1) Sanity only verification that the patch is in compose, applied AC: 2) Manual testing thant ssh connection does not fail with exit code 141 when the connection is performed via multiplexing
    • Pass
    • None
    • Release Note Not Required
    • All
    • None

      What were you trying to do that didn't work?

      When a ssh connection is performed using multiplexing (ControlPath, ControlMaster=auto, ...) and the master connection timed out a few milliseconds before mux_client_hello_exchange() is called, this leads to having ssh die in SIGPIPE, which is a bug.
      Indeed, it's a bug because multiplexing is supposed to have a fallback as noted in the ssh_config(5) manpage:

           ControlMaster
      
                   [...] These sessions will try to reuse the master instance's network connection rather than initiating new ones, but will fall back to connecting normally if the control socket does not exist, or is not listening.
      

      The reason for ssh dying with SIGPIPE is at the time mux_client_hello_exchange() and underlying mux_client_write_packet() executes, the SIGPIPE signal is not yet ignored, causing line 1513 to raise the signal and kill ssh:

      1488 static int
      1489 mux_client_write_packet(int fd, struct sshbuf *m)
      1490 {
       :
      1513                 len = write(fd, ptr + have, need - have);
       :
      

      A quick fix is to ignore the SIGPIPE while doing the hello:

      Original code:

      1575 static int
      1576 mux_client_hello_exchange(int fd)
      1577 {
       :
      1589         if (mux_client_write_packet(fd, m) != 0) {
      1590                 debug_f("write packet: %s", strerror(errno));
      1591                 goto out;
      1592         }
       :
      

      Modified code:

      1575 static int      
      1576 mux_client_hello_exchange(int fd)
      1577 {       
       :
      1581         sshsig_t old_sigpipe;
       :
      1590         old_sigpipe = ssh_signal(SIGPIPE, SIG_IGN);
      1591         r = mux_client_write_packet(fd, m);
      1592         ssh_signal(SIGPIPE, old_sigpipe);
      1593         if (r != 0) {
      1594                 debug_f("write packet: %s", strerror(errno));
      1595                 goto out;
      1596         }
       :
      

      EDIT: Upstream fixed that through 96faa0de6c673a2ce84736eba37fc9fb723d9e5c.

      Please provide the package NVR for which bug is seen:

      openssh-clients-8.7p1-38.el9
      openssh-clients-8.0p1-19.el8_9.2

      How reproducible:

      Often using a quickly closing connection

      Steps to reproduce

      1. Execute the following command in loop
        $ while :; do echo; date +%s.%N; /usr/bin/ssh -o ControlPath=/tmp/%r@%h:%p -o ControlPersist=2 -o ControlMaster=auto localhost hostname || { echo Exited with: $? ; break ; }; sleep 1.6s ; done

        The sleep delay may be adjusted, depending on the hardware, the idea being the new connection will happen just while master connection times out

      Expected results

      Connection to the system never failing in exit code 141.

      Actual results

      1716372217.124957684
      vm-ssh9
      
      1716372219.174926179
      muxclient: master hello exchange failed
      vm-ssh9
      
      1716372221.187231796
      muxclient: master hello exchange failed
      vm-ssh9
      
      1716372223.184413163
      muxclient: master hello exchange failed
      vm-ssh9
      
      1716372225.178581753
      Exited with: 141
      

      Other reproducer using a systemtap script

      1. In a terminal start the following script
        # stap -g -v -e 'probe process("/usr/bin/ssh").statement("*@mux.c:1513") { raise(%{SIGPIPE%}); exit() }'
      2. In another terminal execute the following command in loop
        $ while :; do echo; date +%s.%N; /usr/bin/ssh -o ControlPath=/tmp/%r@%h:%p -o ControlPersist=20 -o ControlMaster=auto localhost hostname || { echo Exited with: $? ; break ; }; sleep 1.6s ; done

        Here above ControlPersist can be larger because we will inject a SIGPIPE directly into the write() location, which shows SIGPIPE is not ignored at the time of the call.

              dbelyavs@redhat.com Dmitry Belyavskiy
              rhn-support-rmetrich Renaud Métrich
              Dmitry Belyavskiy Dmitry Belyavskiy
              Miluse Bezo Konecna Miluse Bezo Konecna
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: