Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-36434

"dnf update" redirected to a file fails with EPIPE when SSL connection fails generates BrokenPipeError

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • rhel-9.4
    • dnf
    • None
    • None
    • Important
    • rhel-sst-cs-software-management
    • ssg_core_services
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • None

      This is a continuation of RHEL-35656 but on the dnf side.

      What were you trying to do that didn't work?

      A customer uses dnf -y update >redirect.out; echo $? command to perform updates automatically and get result.
      When the update is "long" (i.e. the connection to his satellite server is idle for a longer time than KeepAlive of 15 seconds), the customer sees the connection to the satellite server be closed on timeout and this generates a SIGPIPE internally, as seen in the strace excerpt below:

      1229597 14:32:41.789160 write(7</var/log/rhsm/rhsm.log>, "2024-05-13 14:32:41,788 [DEBUG] dnf:1229597:MainThread @connection.py:676 - Closing HTTPS connection <ssl.SSLSocket fd=8, family"..., 222) = 222 <0.000007>
      1229597 14:32:41.790210 write(8<TCP:[10.132.72.128:48858->172.29.73.11:443]>, "\27\3\3\0\23\206\220\273\216\1\344\336\320{A\1\259]\16u'\304\16", 24) = -1 EPIPE (Broken pipe) <0.000012>
      1229597 14:32:41.790267 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1229597, si_uid=0} ---
      1229597 14:32:41.790284 rt_sigreturn({mask=[]}) = -1 EPIPE (Broken pipe) <0.000005>
       :
      1229597 14:32:44.119261 close(5</var/log/dnf.librepo.log>) = 0 <0.000009>
      1229597 14:32:44.267278 exit_group(141) = ?
      1229597 14:32:44.297930 +++ exited with 141 +++
      

      Here above we can see the connection to satellite being closed ("unwrap" called), which fails in EPIPE because the satellite server already closed the connection. This then generates a SIGPIPE and code continues.
      dnf logs the update, then exits with 141, which doesn't make sense and is the issue filed here. 141 is 128 + 13, which means "exit due to signal SIGPIPE".

      I'm not able to reproduce the connection to satellite ending in SIGPIPE, I spent days on this already. But clearly, along with customer's help, we know this leads to the issue.

      Important details:

      The issue doesn't happen when using no redirection for the command (e.g. dnf -y update; echo $?), even though the SIGPIPE signal is already received, as shown in the strace excerpt below:

      1203775 09:40:09.394516 write(7</var/log/rhsm/rhsm.log>, "2024-05-07 09:40:09,394 [DEBUG] yum:1203775:MainThread @connection.py:672 - Closing HTTPS connection <ssl.SSLSocket fd=8, family"..., 222) = 222 <0.000008>
      1203775 09:40:09.396743 futex(0x7f68e9740fec, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000010>
      1203775 09:40:09.396810 write(8<TCP:[10.132.72.128:51940->172.29.73.11:443]>, "\27\3\3\0\23z\237N\265\276kq\221\252\36\235\314\211v\fb\262\271\v", 24) = -1 EPIPE (Broken pipe) <0.000009>
      1203775 09:40:09.396850 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1203775, si_uid=0} ---
       :
      1203775 09:40:14.282206 close(5</var/log/dnf.librepo.log>) = 0 <0.000010>
      1203775 09:40:14.373737 exit_group(0)   = ?
      1203775 09:40:14.388680 +++ exited with 0 +++
      

      The reason for difference is unclear.
      However, along with the customer, we could find out that the difference was due to a modification of signal handling being done when there is no tty (case of command being redirected), see in dnf/i18n.py:

      102     if not stdout.isatty():
      103         signal.signal(signal.SIGPIPE, signal.SIG_DFL)
      

      Commenting out both lines makes the dnf -y update >redirected.out; echo $? command return expected exit code 0.

      Please provide the package NVR for which bug is seen:

      dnf-4.14.0-8.el9.noarch

      How reproducible:

      Always on customer site, wasn't able to reproduce internally at all after spending several days on this.

              packaging-team-maint packaging-team-maint
              rhn-support-rmetrich Renaud Métrich
              packaging-team-maint packaging-team-maint
              Software Management QE Software Management QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: