-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
None
-
False
-
Testable
-
No
-
No
-
No
-
Pending
-
None
-
-
A user failed to deploy a model in dev sandbox. The only error visible to the user was through openshift events.
PodPmodelmesh-serving-model-server-rht-jramirez-dev-7d56c5f7fc94f97 NamespaceNSrht-jramirez-dev May 19, 2023, 3:21 PM Generated from kubelet on ip-10-0-172-120.us-east-2.compute.internal Exec lifecycle hook ([/opt/kserve/mmesh/stop.sh wait]) for Container "mm" in Pod "modelmesh-serving-model-server-rht-jramirez-dev-7d56c5f7fc94f97_rht-jramirez-dev(4d7b0ffb-f064-4e00-8bce-eae89e400bec)" failed - error: command '/opt/kserve/mmesh/stop.sh wait' exited with 137: , message: "waiting for litelinks process to exit after server shutdown triggered\n"
Looking at the controllers, the modelmesh-controller has the following logs.
{"level":"error","ts":1684505461.8559515,"logger":"controller.predictor","msg":"Reconciler error","reconciler group":"serving.kserve.io","reconciler kind":"Predictor","name":"test2","namespace":"isvc_rht-jramirez-dev","error":"failed to SetVModel for InferenceService test2: rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.130.13.77:8033: i/o timeout\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem \t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 \t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"}
the etcd logs are full of similar lines
2023-04-01 08:39:32.617638 W | etcdserver: read-only range request "key:\"modelmesh-serving/mm_ns/fjuma1-dev/mm/modelmesh-serving/vmodels/\" " with result "range_response_count:0 size:4" took too long (151.354562ms) to execute
My suspicion is that etcd is not scaling well in the sandbox