Uploaded image for project: 'Ansible Automation Platform RFEs'
  1. Ansible Automation Platform RFEs
  2. AAPRFE-587

Job can survive the death of the controller under control

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 2.2
    • controller
    • False
    • Hide

      None

      Show
      None
    • False

      Feature Overview

      1. start a job
      2. kill the controller "controlling" the job (not executing!)
      3. the job isn't failed, it's taken over by the next available controller

      Background, and strategic fit

      Increased reliability of the cluster, less jobs failing because of the platform.

      Customer is expecting this feature to take out AWX and replace with AAP.

      (Optional) Use Cases

      n/a

      Assumptions

      • the controller isn't hybrid and doesn't run the job
      • The job explanation that "Task was marked as running but was not present in the job queue, so it has been marked as failed" seems to hint that the logic to detect such cases is already existing and could be extended to place back the "interrupted" job in the queue (this would require a different job status) for another controller to take back under control.
      • This of course assumes that all necessary information is available in the database and that the execution host can accept that another controller than the initial one can take control of an already existing job.

      Out of Scope

      • make a job survive the death of the execution host

              chadwickferman Chad Ferman
              chadwickferman Chad Ferman
              Votes:
              7 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated: