-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
2.2
-
False
-
-
False
Feature Overview
- start a job
- kill a hop node on the path used to control the execution host (or more generally interrupt communication through the used path)
- the job isn't failed, it's just re-routed through the Mesh
Background, and strategic fit
Increased reliability of the cluster, less jobs failing because of the platform.
(Optional) Use Cases
n/a
Assumptions
- Re-routing can be done in the middle of a job
Out of Scope
- make a job survive the death of the execution host
- make a job survive the death of the controller (see AAP-5298)
- clones
-
AAPRFE-587 Job can survive the death of the controller under control
- Backlog
- relates to
-
AAPRFE-186 Better failure handling of running jobs on execution nodes
- Backlog