-
Bug
-
Resolution: Unresolved
-
Undefined
-
rhel-9.3.0.z, rhel-9.4
-
None
-
None
-
Moderate
-
rhel-sst-csi-client-tools
-
None
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
Requested
-
None
-
None
What were you trying to do that didn't work?
A customer, connected to his own Satellite server, repeatedly sees dnf -y update fail with exit code 141 (EPIPE signal) when updating his systems when a lot of packages are updated, hence a long time is required for the update.
Collecting an strace shows that a connection to Satellite, which was not used for a long time, is reused, which causes the EPIPE to show up:
1203717 08:12:39.766625 read(8<TCP:[CLIENT:PORT->SATSERVER:443]>, "SOME DATA READ"..., 1556) = 1556 <0.000006> : : (nothing related to this connection in-between) : 1203717 08:26:03.162837 getsockname(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, {sa_family=AF_INET, sin_port=htons(LPORT), sin_addr=inet_addr("CLIENT")}, [16]) = 0 <0.000008> 1203717 08:26:03.162904 getpeername(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, 0x7ffc8f9230c0, [16]) = -1 ENOTCONN (Transport endpoint is not connected) <0.000005> 1203717 08:26:03.165239 write(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, "SOME DATA WRITTEN", 24) = -1 EPIPE (Broken pipe) <0.000011>
Here above we can see a check on the connection is made (getpeername()), which reports ENOTCONN because the server must have closed the connection already, but still the write() to close the connection is made, causing EPIPE to pop up.
Digging more into this, through using a systemtap script to collect a coredump when getpeername() failed in error (systemtap script available on demand), I could find out that the connection was getting closed because it was old, but the closing was failing in an untrapped exception:
(gdb) py-bt Traceback (most recent call first): File "/usr/lib64/python3.9/socket.py", line 264, in __repr__ raddr = self.getpeername() <built-in method __format__ of SSLSocket object at remote 0x7fca66ead1c0> File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 672, in close_connection log.debug(f"Closing HTTPS connection {self.__conn.sock}") File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 741, in _create_connection self.close_connection() File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1021, in _make_request conn = self._create_connection(cert_file=cert_file, key_file=key_file) File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1130, in _request result, response = self._make_request( File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1347, in request_put result: Dict[str, Any] = self._request( File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1756, in updateCombinedProfile return self.conn.request_put(method, profile, description=_("Updating profile information")) [...]
Corresponding code is below:
729 def _create_connection(self, cert_file: str = None, key_file: str = None) -> httplib.HTTPSConnection: 730 """ 731 This method tries to return existing connection, when connection exists and limit of connection 732 has not been reached (timeout, max number of requests). When no connection exists, then this 733 method creates new TCP and TLS connection. 734 """ 735 736 if self.__conn is not None: 737 # Check if it is still possible to use existing connection 738 now = time.time() 739 if now - self.__conn.last_request_time > self.__conn.keep_alive_timeout: 740 log.debug(f"Connection timeout {self.__conn.keep_alive_timeout}. Closing connection...") 741 >>>>> self.close_connection() 664 def close_connection(self) -> None: 665 """ 666 Try to close connection to server 667 :return: None 668 """ 669 if self.__conn is not None: 670 # Do proper TLS shutdown handshake (TLS tear down) first 671 if self.__conn.sock is not None: 672 log.debug(f"Closing HTTPS connection {self.__conn.sock}") 673 try: 674 >>>>> self.__conn.sock.unwrap() 675 except ssl.SSLError as err: 676 log.debug(f"Unable to close TLS connection properly: {err}") 677 else: 678 log.debug("TLS connection closed") 679 # Then it is possible to close TCP connection 680 self.__conn.close() 681 self.__conn = None
Here above line 674 returns in an uncaught exception, causing the backtrace to occur.
Below is the RHSM log in debug (not taken during the strace):
2024-05-07 08:43:00,613 [ERROR] yum:1203415:MainThread @cache.py:201 - Error updating system data on the server 2024-05-07 08:43:00,614 [ERROR] yum:1203415:MainThread @cache.py:202 - [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python3.9/site-packages/subscription_manager/cache.py", line 191, in update_check self._sync_with_server(uep, consumer_uuid) File "/usr/lib64/python3.9/site-packages/subscription_manager/cache.py", line 563, in _sync_with_server uep.updateCombinedProfile(consumer_uuid, _combined_profile) File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1756, in updateCombinedProfile Sanitizes one or more provided guest Ids. File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1347, in request_put try: File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1130, in _request self._set_accept_language_in_header() File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1066, in _make_request "headers": dict(response.getheaders()), File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1021, in _make_request log.debug(f"Unknown Keep-Alive argument: {key}") File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 741, in _create_connection # Check if it is still possible to use existing connection File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 674, in close_connection # Do proper TLS shutdown handshake (TLS tear down) first File "/usr/lib64/python3.9/ssl.py", line 1319, in unwrap s = self._sslobj.shutdown() BrokenPipeError: [Errno 32] Broken pipe
I would expect to be able to reproduce myself, but I don't get this Updating profile information step, maybe because I'm directly connected to the CDN and not a Satellite server.
Please provide the package NVR for which bug is seen:
subscription-manager-1.29.38-1.el9_3.x86_64
also seen after updating to RHEL9.4 package
How reproducible:
Always on customer system
- is duplicated by
-
RHEL-36434 "dnf update" redirected to a file fails with EPIPE when SSL connection fails generates BrokenPipeError
- Closed
- links to