-
Sub-task
-
Resolution: Done
-
Undefined
-
None
-
0
-
False
-
-
False
-
0
-
Endeavour
-
-
Description of problem: When dynflow-sidekiq service is restarted while a job is run on multiple hosts sequentially, the next host's run in line after the restart fails: "Error loading data from Capsule: RestClient::NotFound - 404 Not Found" Version-Release number of selected component (if applicable): Reproduced on Sat Stream 6.15 snap 37.0. I couldn't reproduce on Sat 6.14 so this is a regression. How reproducible: Deterministic Steps to Reproduce: 1. Have a Satellite with two hosts registered 2. Create a Job Template useful for your debugging, I used contents: echo $(date) >> /root/test-<%= @host %>; sleep 120; echo slept-$(date) >> /root/test-<%= @host %> 3. Monitor -> Jobs -> Run Job 4. Select that template 5. Set filter to match the two hosts 6. Set concurrency level to one 7. Submit 8. On Satellite: # systemctl stop dynflow-sidekiq@* 9. Wait a few seconds... 10. # systemctl start dynflow-sidekiq@orchestrator.service dynflow-sidekiq@worker-1.service dynflow-sidekiq@worker-hosts-queue-1.service 11. Wait... 12. In WebUI, watch Job run details. Run on the first host should end somehow depending on what phase the daemon was killed in - it either fails after ~15 minutes or succeeds, it doesn't matter. Then run on the second host should start. Actual results: The run never finishes and its output shows the following error repeated indefinitely: "Error loading data from Capsule: RestClient::NotFound - 404 Not Found". No further hosts will ever run the job. Expected results: The run on second host should pass and other potential hosts should get their turn afterwards.
QE Tracker for https://issues.redhat.com/browse/SAT-21457
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2251014