Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-38397

Pulp content app never close Redis connections

XMLWordPrintable

    • Bug Fix
    • Hide
      .Idle Redis connections are now closed after timeout

      Previously, idle Redis connections persisted because the timeout was not set for the Redis service on the {ProjectServer}.
      This led to the Pulp content workers not accepting any new connections as the open file limit was reached.
      Pulp content workers would fail with the `Too many open files` error resulting in DNF connection time outs on the client systems.
      With this release, the default timeout for idle Redis connections is set to 60 seconds, after which the idle connections are closed.
      This prevents DNF timeouts on client systems due to Pulp reaching the open file limit.
      Show
      .Idle Redis connections are now closed after timeout Previously, idle Redis connections persisted because the timeout was not set for the Redis service on the {ProjectServer}. This led to the Pulp content workers not accepting any new connections as the open file limit was reached. Pulp content workers would fail with the `Too many open files` error resulting in DNF connection time outs on the client systems. With this release, the default timeout for idle Redis connections is set to 60 seconds, after which the idle connections are closed. This prevents DNF timeouts on client systems due to Pulp reaching the open file limit.
    • Done
    • No Coverage
    • Yes

      Description of problem:

      Pulp content application makes multiple Redis connections when handling a request.  The connections never get disconnected and never get released to the connections pool. Therefore, the connections pool creates more and more new connections to handle subsequent requests.

      In a busy environment, this might cause Pulp content application to eventually reach the 260K open file limit.

       

      How reproducible:

      Easy

       

      Is this issue a regression from an earlier version:

      I think since Pulp cache is enabled.

       

      Steps to Reproduce:

      1. Run the following commands to monitor the numbers of "established" Redis connection.

      # Restart the services to make sure we start from 0 established connections.
      $ systemctl restart redis pulpcore-content
      $ watch 'lsof | grep -P $(pgrep -f "/usr/bin/pulpcore-content" | tr "\n" "|" | sed -E "s/\|$//") | grep ":redis" | grep -i established | wc -l'

      2. Open a new terminal,  run the following script to send 200 content requests to Pulp content app workers.

      irb
      > require 'restclient'
      > 200.times { Thread.new { begin; RestClient::Resource.new("http://satellite.example.com/pulp/content/ACME/Library/custom/custom/epel8/repodata/repomd.xml").get; rescue Standard
      Error => e; p e.message; end } }

      Actual behavior:
      Observed a few hundreds of "ESTABLISHED" connections and will never close.

       

      $ lsof | grep -P $(pgrep -f "/usr/bin/pulpcore-content" | tr "\n" "|" | sed -E "s/\|$//") | grep ":redis" | grep -i established | wc -l
      620 

       

       

      Expected behavior:
      Close all idle connections.

       

      Business Impact / Additional info:

      Redis server has a "timeout" option to close idle connections. This has been disabled.

      In " /etc/redis.conf"

      # Close the connection after a client is idle for N seconds (0 to disable)
      timeout 0

      Set the "timeout" to a reasonable value (e.g. 60 seconds) and restart the Redis server fixed the issue. All idle connections are closed after timing out.

      This setting can be set in "/etc/foreman-installer/custom-hiera.yaml" file and run the satellite-installer.

      redis::timeout: 60

              rhn-engineering-lstejska Leos Stejskal
              rhn-support-hyu Hao Chang Yu
              Nakul Pathak Nakul Pathak
              Akshay Gadhave Akshay Gadhave
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: