Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-13864

MQTT ReX mode makes it too easy to to DDOS Satellite

XMLWordPrintable

    • None
    • None
    • None
    • None

      Description of problem:
      Having Satellite with MQTT ReX mode and 3k clients, I can not run ReX on all of them in one go as before.

      Version-Release number of selected component (if applicable):
      satellite-6.12.0-4.el8sat.noarch

      How reproducible:
      always

      Steps to Reproduce:
      1. Configure Sat with `satellite-installer -foreman-proxy-plugin-remote-execution-script-mode=pull-mqtt` (in my case I also used "-tuning=medium --foreman-foreman-service-puma-threads-min=16 --foreman-foreman-service-puma-threads-max=16 --foreman-foreman-service-puma-workers=8 --foreman-db-pool=128 --foreman-proxy-plugin-remote-execution-script-install-key=true")
      2. Ran a "Command" ReX on 3000 hosts with:
      subscription-manager refresh
      yum -y install insights-client
      insights-client --register

      Actual results:
      Log proxy.log has a bunch of SSL errors:

      2022-11-02T09:04:25 [E] <OpenSSL::SSL::SSLError> SSL_accept SYSCALL returned=5 errno=0 state=TLSv1.3 early data
      /usr/share/ruby/webrick/server.rb:299:in `accept'
      /usr/share/ruby/webrick/server.rb:299:in `block (2 levels) in start_thread'
      /usr/share/ruby/webrick/utils.rb:263:in `timeout'
      /usr/share/ruby/webrick/server.rb:297:in `block in start_thread'

      The client wasn't able to retrieve the job:

      Nov 02 09:04:25 f09-h26-b02-5039ms-container100.red.ddns.perf.redhat.com yggdrasild[647]: [yggdrasild] 2022/11/02 09:04:25 cannot get detached message content: cannot download from URL: Get "https://f09-h26-b01-5039ms.rdu2.scalelab.redhat.com:9090/ssh/jobs/88e7dae2-3998-4366-b9e5-aa4216e0030f": net/http: TLS handshake timeout

      => ReX job got stuck after 879 succeeded tasks, with 2121 pending.

      Expected results:
      We either need to decide on documenting max allowed number of hosts one capsule with MQTT can handle, or providing a way how to limit concurrency when running the host.

      FYI as of now, default tuning is documented to be able to handle 5k hosts.

      Additional info:
      Thanks aruzicka for investigating this!

      When I switched to SSH mode with `satellite-installer --foreman-proxy-plugin-remote-execution-script-mode=ssh` and re-ran the job (after cancelling sub-tasks from previous one), 2991 succeeded, only 9 failed.

      I do not mark this as a regression, because IIRC MQTT is optional.

          There are no Sub-Tasks for this issue.

              jira-bugzilla-migration RH Bugzilla Integration
              jira-bugzilla-migration RH Bugzilla Integration
              Peter Ondrejka Peter Ondrejka
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: