Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-8907

Intermittent HTTP 502 Bad Gateway cause by connection reset by peer when Satellite server having high http load

XMLWordPrintable

    • Platform Sprint 23
    • Important

      Description of problem:

      After fixing bug 1976694, I found that clients will still get connection reset by peer from the Puma server when we put even higher HTTP load.

      Based on what I understand, Puma has 1024 default backlog but the default SOMAXCONN kernel parameter is only 128.

      According to "man listen"
      ---------------------------
      If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.
      ---------------------------

      This means the 1024 backlog will be capped to 128. We can determine this by running the following command:

      1. sysctl net.core.somaxconn
        net.core.somaxconn = 128 <========
      1. ss -l | grep foreman
        Netid State Recv-Q Send-Q Local Address:Port
        u_str LISTEN 0 128 /run/foreman.sock XXXXXXX * 0 <============ Send-Q (max listen backlog) is 128

      Steps to Reproduce:

      1) Run the following http on any host to generate 600 simultaneous requests to the Satellite server

      irb
      require 'rest_client'
      600.times { Thread.new

      { begin; RestClient::Resource.new("https://example.satellite.com/api/v2/hosts?installed_package_name=kernel&page=1&per_page=20", user: "your admin username", password: "your password", timeout: 3600, open_timeout: 3600, verify_ssl: OpenSSL::SSL::VERIFY_NONE).get; rescue StandardError => e; p e.message; end }

      }

      2. On the Satellite, server run the following command to monitor the backlog queue. "Recv-Q" (current listen backlog) will increase up to 129.

      1. watch 'ss -l | grep foreman'
        Netid State Recv-Q Send-Q Local Address:Port
        u_str LISTEN 129 128 /run/foreman.sock XXXXXXX * 0

      3. Open another window and run the following command to monitor the queue overflow and packet drop. Queue overflow times and dropped SYN will increase once "Recv-Q" reached 129.

      watch 'netstat -s | grep -i LISTEN'
      16712 times the listen queue of a socket overflowed
      17602 SYNs to LISTEN sockets dropped

      Actual results:
      Receive many "Connection reset by peer - SSL_connect" error on clients

      Expected results:
      No "Connection reset by peer - SSL_connect" error

      Additional info:
      I am able to avoid this issue completely after increasing the SOMAXCONN kernel parameter.

      Temporary increase the somaxconn
      -------------------------------------
      sysctl -w net.core.somaxconn=1024
      systemctl daemon-reload
      systemctl restart foreman.socket foreman
      -------------------------------------

      1. ss -l | grep foreman
        u_str LISTEN 0 1024 /run/foreman.sock XXXXXXX * 0 <================== Send-Q (max listen backlog) is now 1024

            jira-bugzilla-migration RH Bugzilla Integration
            ehelms@redhat.com Eric Helms
            Gaurav Talreja Gaurav Talreja
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: