-
Feature Request
-
Resolution: Duplicate
-
Major
-
None
-
1.34.0.Final
-
None
-
False
-
None
-
False
-
---
-
---
-
-
The operator could offer a basic fail recovery for workflow deployment attempts. In the Kubernetes literature, we can find a few basic steps that could be implemented by the operator, like rolling back to a previous version.
See more at:
- https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#operating-on-a-failed-deployment
- https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/
To this date, the only measure taken by the operator (only in devmode) is to roll out the most successful replica.
The operator can be more throughout and analyze the pod instances. For example, an image not found in the registry could be replaced by one of the defaults.
We should start with potential failure scenarios in a support document and work on implementation from there.
- depends on
-
KOGITO-8748 Recover from failure algorithm is ignoring "RequeueAfter"
- Resolved
- is incorporated by
-
KOGITO-8794 [KSW-Operator] Handle deployment failures in prod profile
- Resolved
- is triggered by
-
KOGITO-8641 Stabilize Kogito Serverless Operator Dev Profile
- Resolved
-
KOGITO-8784 Review SonataFlow Operator Prod Profile
- Resolved