-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
-
From Knative docs upstream, the progress-deadline of a Knative service is 10minutes. It seems to be the same default for Serverless. However, there are cases when workloads may take longer than 10minues. In such cases, Knative would mark the deployment as Failed and will scale it down.
In OpenShift AI, as mentioned in RHOAIENG-7609 (which is a bug), this issue was hit when deploying AI models and node autoscaling is enabled. Deploying an AI model which required a GPU triggered provisioning of a new node which took more than 10 minutes for the node to become usable and pods could be scheduled on it. This is just an instance, but there could be other reasons for a pod to take long to start.
Since AI models may have different resource requirements, it is not possible to use the same progress-deadline for all. Thus, proper support is needed for configuring the progress-deadline in a per-service basis.
- is documented by
-
SRVKS-1255 [DOC] Document Serving deployment configuration options
- In Progress
- is triggering
-
SRVKS-1254 Test Serving deployment configuration
- Closed