Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-35656

"dnf update" fails with EPIPE at end of update when RHSM executes "Updating profile information"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • rhel-9.6
    • rhel-9.3.0.z, rhel-9.4
    • subscription-manager
    • None
    • None
    • Moderate
    • rhel-sst-csi-client-tools
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None

      What were you trying to do that didn't work?

      A customer, connected to his own Satellite server, repeatedly sees dnf -y update fail with exit code 141 (EPIPE signal) when updating his systems when a lot of packages are updated, hence a long time is required for the update.
      Collecting an strace shows that a connection to Satellite, which was not used for a long time, is reused, which causes the EPIPE to show up:

      1203717 08:12:39.766625 read(8<TCP:[CLIENT:PORT->SATSERVER:443]>, "SOME DATA READ"..., 1556) = 1556 <0.000006>
       :
       : (nothing related to this connection in-between)
       :
      1203717 08:26:03.162837 getsockname(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, {sa_family=AF_INET, sin_port=htons(LPORT), sin_addr=inet_addr("CLIENT")}, [16]) = 0 <0.000008>
      1203717 08:26:03.162904 getpeername(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, 0x7ffc8f9230c0, [16]) = -1 ENOTCONN (Transport endpoint is not connected) <0.000005>
      1203717 08:26:03.165239 write(8<TCP:[CLIENT:LPORT->SATSERVER:443]>, "SOME DATA WRITTEN", 24) = -1 EPIPE (Broken pipe) <0.000011>
      

      Here above we can see a check on the connection is made (getpeername()), which reports ENOTCONN because the server must have closed the connection already, but still the write() to close the connection is made, causing EPIPE to pop up.

      Digging more into this, through using a systemtap script to collect a coredump when getpeername() failed in error (systemtap script available on demand), I could find out that the connection was getting closed because it was old, but the closing was failing in an untrapped exception:

      (gdb) py-bt
      Traceback (most recent call first):
        File "/usr/lib64/python3.9/socket.py", line 264, in __repr__
          raddr = self.getpeername()
        <built-in method __format__ of SSLSocket object at remote 0x7fca66ead1c0>
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 672, in close_connection
          log.debug(f"Closing HTTPS connection {self.__conn.sock}")
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 741, in _create_connection
          self.close_connection()
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1021, in _make_request
          conn = self._create_connection(cert_file=cert_file, key_file=key_file)
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1130, in _request
          result, response = self._make_request(
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1347, in request_put
          result: Dict[str, Any] = self._request(
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1756, in updateCombinedProfile
          return self.conn.request_put(method, profile, description=_("Updating profile information"))
      [...]
      

      Corresponding code is below:

       729     def _create_connection(self, cert_file: str = None, key_file: str = None) -> httplib.HTTPSConnection:
       730         """
       731         This method tries to return existing connection, when connection exists and limit of connection
       732         has not been reached (timeout, max number of requests). When no connection exists, then this
       733         method creates new TCP and TLS connection.
       734         """
       735 
       736         if self.__conn is not None:
       737             # Check if it is still possible to use existing connection
       738             now = time.time()
       739             if now - self.__conn.last_request_time > self.__conn.keep_alive_timeout:
       740                 log.debug(f"Connection timeout {self.__conn.keep_alive_timeout}. Closing connection...")
       741 >>>>>           self.close_connection()
       
        664     def close_connection(self) -> None:
       665         """
       666         Try to close connection to server
       667         :return: None
       668         """
       669         if self.__conn is not None:
       670             # Do proper TLS shutdown handshake (TLS tear down) first
       671             if self.__conn.sock is not None:
       672                 log.debug(f"Closing HTTPS connection {self.__conn.sock}")
       673                 try:
       674 >>>>>               self.__conn.sock.unwrap()
       675                 except ssl.SSLError as err:
       676                     log.debug(f"Unable to close TLS connection properly: {err}")
       677                 else:
       678                     log.debug("TLS connection closed")
       679             # Then it is possible to close TCP connection
       680             self.__conn.close()
       681         self.__conn = None
      

      Here above line 674 returns in an uncaught exception, causing the backtrace to occur.
      Below is the RHSM log in debug (not taken during the strace):

      2024-05-07 08:43:00,613 [ERROR] yum:1203415:MainThread @cache.py:201 - Error updating system data on the server
      2024-05-07 08:43:00,614 [ERROR] yum:1203415:MainThread @cache.py:202 - [Errno 32] Broken pipe
      Traceback (most recent call last):
        File "/usr/lib64/python3.9/site-packages/subscription_manager/cache.py", line 191, in update_check
          self._sync_with_server(uep, consumer_uuid)
        File "/usr/lib64/python3.9/site-packages/subscription_manager/cache.py", line 563, in _sync_with_server
          uep.updateCombinedProfile(consumer_uuid, _combined_profile)
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1756, in updateCombinedProfile
          Sanitizes one or more provided guest Ids.
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1347, in request_put
          try:
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1130, in _request
          self._set_accept_language_in_header()
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1066, in _make_request
          "headers": dict(response.getheaders()),
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 1021, in _make_request
          log.debug(f"Unknown Keep-Alive argument: {key}")
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 741, in _create_connection
          # Check if it is still possible to use existing connection
        File "/usr/lib64/python3.9/site-packages/rhsm/connection.py", line 674, in close_connection
          # Do proper TLS shutdown handshake (TLS tear down) first
        File "/usr/lib64/python3.9/ssl.py", line 1319, in unwrap
          s = self._sslobj.shutdown()
      BrokenPipeError: [Errno 32] Broken pipe
      

      I would expect to be able to reproduce myself, but I don't get this Updating profile information step, maybe because I'm directly connected to the CDN and not a Satellite server.

      Please provide the package NVR for which bug is seen:

      subscription-manager-1.29.38-1.el9_3.x86_64
      also seen after updating to RHEL9.4 package

      How reproducible:

      Always on customer system

              jhnidek@redhat.com Jiri Hnidek
              rhn-support-rmetrich Renaud Métrich
              CSI Client Tools Bugs Bot CSI Client Tools Bugs Bot
              Jan Stavel Jan Stavel
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: