-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
rhel-9.4
-
None
-
None
-
Important
-
rhel-sst-cs-software-management
-
ssg_core_services
-
None
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
None
This is a continuation of RHEL-35656 but on the dnf side.
What were you trying to do that didn't work?
A customer uses dnf -y update >redirect.out; echo $? command to perform updates automatically and get result.
When the update is "long" (i.e. the connection to his satellite server is idle for a longer time than KeepAlive of 15 seconds), the customer sees the connection to the satellite server be closed on timeout and this generates a SIGPIPE internally, as seen in the strace excerpt below:
1229597 14:32:41.789160 write(7</var/log/rhsm/rhsm.log>, "2024-05-13 14:32:41,788 [DEBUG] dnf:1229597:MainThread @connection.py:676 - Closing HTTPS connection <ssl.SSLSocket fd=8, family"..., 222) = 222 <0.000007> 1229597 14:32:41.790210 write(8<TCP:[10.132.72.128:48858->172.29.73.11:443]>, "\27\3\3\0\23\206\220\273\216\1\344\336\320{A\1\259]\16u'\304\16", 24) = -1 EPIPE (Broken pipe) <0.000012> 1229597 14:32:41.790267 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1229597, si_uid=0} --- 1229597 14:32:41.790284 rt_sigreturn({mask=[]}) = -1 EPIPE (Broken pipe) <0.000005> : 1229597 14:32:44.119261 close(5</var/log/dnf.librepo.log>) = 0 <0.000009> 1229597 14:32:44.267278 exit_group(141) = ? 1229597 14:32:44.297930 +++ exited with 141 +++
Here above we can see the connection to satellite being closed ("unwrap" called), which fails in EPIPE because the satellite server already closed the connection. This then generates a SIGPIPE and code continues.
dnf logs the update, then exits with 141, which doesn't make sense and is the issue filed here. 141 is 128 + 13, which means "exit due to signal SIGPIPE".
I'm not able to reproduce the connection to satellite ending in SIGPIPE, I spent days on this already. But clearly, along with customer's help, we know this leads to the issue.
Important details:
The issue doesn't happen when using no redirection for the command (e.g. dnf -y update; echo $?), even though the SIGPIPE signal is already received, as shown in the strace excerpt below:
1203775 09:40:09.394516 write(7</var/log/rhsm/rhsm.log>, "2024-05-07 09:40:09,394 [DEBUG] yum:1203775:MainThread @connection.py:672 - Closing HTTPS connection <ssl.SSLSocket fd=8, family"..., 222) = 222 <0.000008> 1203775 09:40:09.396743 futex(0x7f68e9740fec, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000010> 1203775 09:40:09.396810 write(8<TCP:[10.132.72.128:51940->172.29.73.11:443]>, "\27\3\3\0\23z\237N\265\276kq\221\252\36\235\314\211v\fb\262\271\v", 24) = -1 EPIPE (Broken pipe) <0.000009> 1203775 09:40:09.396850 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1203775, si_uid=0} --- : 1203775 09:40:14.282206 close(5</var/log/dnf.librepo.log>) = 0 <0.000010> 1203775 09:40:14.373737 exit_group(0) = ? 1203775 09:40:14.388680 +++ exited with 0 +++
The reason for difference is unclear.
However, along with the customer, we could find out that the difference was due to a modification of signal handling being done when there is no tty (case of command being redirected), see in dnf/i18n.py:
102 if not stdout.isatty():
103 signal.signal(signal.SIGPIPE, signal.SIG_DFL)
Commenting out both lines makes the dnf -y update >redirected.out; echo $? command return expected exit code 0.
Please provide the package NVR for which bug is seen:
dnf-4.14.0-8.el9.noarch
How reproducible:
Always on customer site, wasn't able to reproduce internally at all after spending several days on this.
- duplicates
-
RHEL-35656 "dnf update" fails with EPIPE at end of update when RHSM executes "Updating profile information"
- In Progress