-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
2.2
-
False
-
-
False
What is the nature and description of the request?
Currently when a job fails on an execution node it needs to be manually relaunched and may leave a managed node in an inconsistent state. This request is to have a way for Controller to do its best to prevent nodes being left in an inconsistent state should a running job fail during the run.
Why does the customer need this? (List the business requirements here)
If a job fails it can leave the managed node in an undesired or inconsistent state depending on where it failed.
How would you like to achieve this? (List the functional requirements here)
The customer would like to see the job picked up in progress by other EE nodes, if available, within the instance group.
Unsure if there are other ways to go about this.
- is related to
-
AAPRFE-583 Job can survive the death of the Mesh path used to control
- Backlog
-
AAPRFE-587 Job can survive the death of the controller under control
- Backlog
-
AAPRFE-256 Look into making the cancellation workflow more robust
- Backlog