-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
Scrum Sprint 24, Scrum Sprint 25, Scrum Sprint 26
We currently do not have any timeout for job queue, it can get stuck forever (until pod restart).
We need to set a timer and cancel the context after some hard limit (10 minutes or something). Jobs must be aware of context cancellation and check in between steps and immediately cancel on request. Both AWS and Azure SDK seems to be context-aware so it should immediately cancel any operation too.
We should log an error into logs (and sentry).