-
Bug
-
Resolution: Done-Errata
-
Normal
-
6.15.0
-
None
-
False
-
-
False
-
CLOSED
-
rubygem-foreman_remote_execution-13.2.3, rubygem-foreman-tasks-9.2.2, rubygem-smart_proxy_dynflow-0.9.3
-
1,400
-
Endeavour
-
-
-
Moderate
-
No
Description of problem:
REX jobs failing with Proxy task gone missing if capsule->satellite callback fails. This bug manifests itself the same way as BZ2270295[1], the other bug is caused by changes done in smart_proxy_dynflow-0.9.1, this one is caused by changes that went out in foreman-tasks-5.3.0 (around Satellite 6.12).
If you see the proxy task gone missing errors and find `[E] <RuntimeError> Failed performing callback to Foreman server` in proxy logs, then it is this one. Alternatively, if the job runs for more than 10 minutes, it is this one.
Version-Release number of selected component (if applicable):
satellite-6.15.0
satellite-6.14.*
How reproducible:
The conditions to trigger the bug are sort of difficult to reproduce
Steps to Reproduce:
1. Run a job
2. Ensure that the callback from capsule to satellite fails
This is rather difficult to do naturally, in development I resorted to modifying code. @pmendezh hit this "in the wild" during scale tests
3. Wait until Satellite checks on the capsule
Actual results:
The rex job fails with 'Proxy task gone missing'
Expected results:
The job succeeds or fails, depending on whether it succeeded or failed on the capsule
Additional info:
- is related to
-
SAT-26242 'Actions::RemoteExecution::RunHostJob' tasks intermittently fail with 'Error: RuntimeError Unexpected event #'
- Closed
- links to
-
RHBA-2024:140284 Important: Satellite 6.16.0 release