Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-14690

pulp_streamer runs out of file descriptors when upstream server is unavailable

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • CLOSED
    • 1,300
    • Phoenix
    • Sprint 113, Sprint 114, Sprint 115, Sprint 116
    • Important

      Description of problem:
      pulp_streamer runs out of file descriptors when package requested by clients is missing and upstream server is unavailable
      
      Version-Release number of selected component (if applicable):
      python-pulp-streamer-2.21.5.3-1.el7sat.noarch
      satellite-6.9.7-1.el7sat.noarch
      
      How reproducible:
      Unknown, suspected always
      
      Steps to Reproduce:
      1. Configure a custom repository pointing to another server with on-demand policy
      2. Synchronise the repo, make it available to many hosts
      3. Shutdown the upstream server and have many hosts trying to download packages from the custom repository
      
      Actual results:
      pulp_streamer runs out of file descriptors and starts erroring with "Too many open files":
      
      Jan 31 02:30:37 ecpvm003101 pulp_streamer: urllib3.connectionpool:WARNING: Retrying (Retry(total=1, connect=5, read=1, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', error(24, 'Too many open files'))': /rhel8/<repo>/<package>.rpm
      Jan 31 02:30:37 ecpvm003101 pulp_streamer: urllib3.connectionpool:INFO: Starting new HTTP connection (5): reposerver.example.com
      Jan 31 02:30:38 ecpvm003101 pulp_streamer: urllib3.connectionpool:WARNING: Retrying (Retry(total=2, connect=5, read=2, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', error(24, 'Too many open files'))': /rhel8/<repo>/<package>.rpm
      Jan 31 02:30:38 ecpvm003101 pulp_streamer: urllib3.connectionpool:INFO: Starting new HTTP connection (4): reposerver.example.com
      
      lsof shows that most file descriptors are consumed by connections in CLOSE_WAIT:
      
      COMMAND     PID     USER   FD      TYPE             DEVICE    SIZE/OFF       NODE NAME
      [...]
      pulp_stre  5273       48 1019u     IPv4          457251852         0t0        TCP 127.0.0.1:8751->127.0.0.1:41998 (CLOSE_WAIT)
      pulp_stre  5273       48 1020u     IPv4          456965314         0t0        TCP 127.0.0.1:8751->127.0.0.1:40808 (CLOSE_WAIT)
      pulp_stre  5273       48 1021u     IPv4          456965280         0t0        TCP 127.0.0.1:8751->127.0.0.1:40800 (CLOSE_WAIT)
      pulp_stre  5273       48 1022u     IPv4          456679856         0t0        TCP 127.0.0.1:8751->127.0.0.1:39244 (CLOSE_WAIT)
      pulp_stre  5273       48 1023u     IPv4          457094161         0t0        TCP 127.0.0.1:8751->127.0.0.1:41836 (CLOSE_WAIT)
      
      foreman-ssl_access_ssl.log shows that HTTP 502 is being returned (see bug 1432985):
      
      1.2.3.4 - - [31/Jan/2022:02:29:39 +0000] "GET /streamer/var/lib/pulp/content/units/rpm/5a/a3140f62c74a19c00df434d3b278d57d9cfcf6ad742e60a05fff635e5bbeee/<package>.rpm?policy=eyJleHRlbnNpb25zIjogeyJyZW1vdGVfaXAiOiAiMTAuMTM1LjY2LjEzMyJ9LCAicmVzb3VyY2UiOiAiL3N0cmVhbWVyL3Zhci9saWIvcHVscC9jb250ZW50L3VuaXRzL3JwbS81YS9hMzE0MGY2MmM3NGExOWMwMGRmNDM0ZDNiMjc4ZDU3ZDljZmNmNmFkNzQyZTYwYTA1ZmZmNjM1ZTViYmVlZS9xdWFseXMtY2xvdWQtYWdlbnQueDg2XzY0LnJwbSIsICJleHBpcmF0aW9uIjogMTY0MzU5NjIxM30%3D;signature=oA6NDIG08afL1EfIILL00HEovuTDv4sKqesWGKmSkic5DBPn4HOWlQxPG4qpzsPlPrJKq4Nbn54_mkWTUyKdWaMdxOtnszXeLxnnxXxXYsGemrIfwGUkBFm8SKL_LjvBm-ji5Ln_o66u0I5XYSgkkuJ_857-J89Ol_Ij6T4bpxrvXo6HEFOdkF7UByMcxEhW0ehiItxG0m1uRkXDk2XnVV7zdUXJeWnPuB0edJh2Lw8qGnHukP2qMibfAAAPEhRDVX0SXB3Dw-SiidY73jT2GABOELTS7lyzpQBqno2ixezxrT-FLv4x02i6x7w6pX-K-PjCT8mgq6EiHasUDsDQpg%3D%3D HTTP/1.1" 502 649 "-" "pulp/2.21.5.3"
      
      Expected results:
      pulp_streamer does not run out of file descriptors when the upstream server is unavailable and a large number of hosts request a package that needs to be fetched from the unavailable server
      
      Additional info:
      While current versions default to Immediate for custom repositories, many customers that have upgraded from earlier versions retain the On-demand default.

            jira-bugzilla-migration RH Bugzilla Integration
            jira-bugzilla-migration RH Bugzilla Integration
            Vladimír Sedmík Vladimír Sedmík
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: