Loading...

XML

Word

Printable

Increased reliability of the cluster, less jobs failing because of the platform.

Customer is expecting this feature to take out AWX and replace with AAP.

n/a

the controller isn't hybrid and doesn't run the job
The job explanation that "Task was marked as running but was not present in the job queue, so it has been marked as failed" seems to hint that the logic to detect such cases is already existing and could be extended to place back the "interrupted" job in the queue (this would require a different job status) for another controller to take back under control.
This of course assumes that all necessary information is available in the database and that the execution host can accept that another controller than the initial one can take control of an already existing job.

is cloned by

AAPRFE-583 Job can survive the death of the Mesh path used to control

relates to

AAPRFE-186 Better failure handling of running jobs on execution nodes