Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-41274

Sending OpenSCAP reports concurrently requires too much memory on Capsule/Satellite

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • SCAP Plugin
    • None
    • False
    • sat-endeavour
    • None
    • None
    • None
    • None

      Description of problem:

      When a Capsule (or Satellite) processes an OpenSCAP report from a client, it forks a new `smart-proxy` process for that (https://github.com/theforeman/smart_proxy_openscap/blob/master/lib/smart_proxy_openscap/helpers.rb#L8). This approach does not scale when tens or hundreds of client systems upload their profile at the same time (typical setup per Satellite). In that case, many tens of spawned `smart-proxy` processes are running, each consuming 300M+ memory (often 0.5G-3G). That triggers a huge memory consumption in (a few tens of) GBs.
       
      The spwaning approach simply does not scale wrt consumed memory. Esp. for the recommended scenario "run the reports on all clients at one time" (there is no splay among clients).

      How reproducible:
      100%
       

      Is this issue a regression from an earlier version:
      YES, since 6.15 on RHEL8 to 6.16 on RHEL9
       

      Steps to Reproduce:

      1. Setup some clients to send OpenSCAP reports concurrently - the more the better. A very minimalistic reproducer just mimics sending same report many times concurently. For this:

      • setup one client
      • edit its `/usr/bin/foreman_scap_client` on line 272 by adding there "cp $(results_bzip_path) /tmp".
      • you can mimic just sending the SCAP results directly by uploading the "results.xml.bz2"

      2. Either let cron to run the "foreman_scap_client" an many hosts at the same time, OR mimic it using the minimalistic reproducer:

      capsule=your.capsule.fqdn  # or Satellite
      
      for i in $(seq q 50); do
          curl --silent --show-error --cacert /etc/rhsm/ca/katello-server-ca.pem --cert /etc/pki/consumer/cert.pem --key /etc/pki/consumer/key.pem --header Content-Type:text/xml --header Content-Encoding:x-bzip2 --max-time 60 --data-binary @results.xml.bz2 https://${capsule}:9090/compliance/arf/1 &
      done
      

      3. Meantime, monitor`smart-proxy` processes on the Capsule/Satellite:

      while true; do
        date
        ps aux | grep -v grep | grep smart-proxy
        free
        sleep 2
      done
      

      Actual behavior:
      Many smart-proxy processes are spawned. Memory usage grows linearly wrt. # of concurrent client reports sent.

      In an extreme case, OOM killer kills some of the `smart-proxy` processes (though masked under "diagnostic_con*" procname, weirdly).

      Expected behavior:
      Limit the number of forked processes / the amount of consumed memory.

      Business Impact / Additional info:
      High memory requirements, leading even to OOM.

              Unassigned Unassigned
              rhn-support-pmoravec Pavel Moravec
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: