-
Bug
-
Resolution: Done-Errata
-
Major
-
6.12.0
Description of problem:
Having Satellite with MQTT ReX mode and 3k clients, I can not run ReX on all of them in one go as before.
Version-Release number of selected component (if applicable):
satellite-6.12.0-4.el8sat.noarch
How reproducible:
always
Steps to Reproduce:
1. Configure Sat with `satellite-installer -foreman-proxy-plugin-remote-execution-script-mode=pull-mqtt` (in my case I also used "-tuning=medium --foreman-foreman-service-puma-threads-min=16 --foreman-foreman-service-puma-threads-max=16 --foreman-foreman-service-puma-workers=8 --foreman-db-pool=128 --foreman-proxy-plugin-remote-execution-script-install-key=true")
2. Ran a "Command" ReX on 3000 hosts with:
subscription-manager refresh
yum -y install insights-client
insights-client --register
Actual results:
Log proxy.log has a bunch of SSL errors:
2022-11-02T09:04:25 [E] <OpenSSL::SSL::SSLError> SSL_accept SYSCALL returned=5 errno=0 state=TLSv1.3 early data
/usr/share/ruby/webrick/server.rb:299:in `accept'
/usr/share/ruby/webrick/server.rb:299:in `block (2 levels) in start_thread'
/usr/share/ruby/webrick/utils.rb:263:in `timeout'
/usr/share/ruby/webrick/server.rb:297:in `block in start_thread'
The client wasn't able to retrieve the job:
Nov 02 09:04:25 f09-h26-b02-5039ms-container100.red.ddns.perf.redhat.com yggdrasild[647]: [yggdrasild] 2022/11/02 09:04:25 cannot get detached message content: cannot download from URL: Get "https://f09-h26-b01-5039ms.rdu2.scalelab.redhat.com:9090/ssh/jobs/88e7dae2-3998-4366-b9e5-aa4216e0030f": net/http: TLS handshake timeout
=> ReX job got stuck after 879 succeeded tasks, with 2121 pending.
Expected results:
We either need to decide on documenting max allowed number of hosts one capsule with MQTT can handle, or providing a way how to limit concurrency when running the host.
FYI as of now, default tuning is documented to be able to handle 5k hosts.
Additional info:
Thanks aruzicka for investigating this!
When I switched to SSH mode with `satellite-installer --foreman-proxy-plugin-remote-execution-script-mode=ssh` and re-ran the job (after cancelling sub-tasks from previous one), 2991 succeeded, only 9 failed.
I do not mark this as a regression, because IIRC MQTT is optional.