-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
RHOAI_2.5.0_GA
-
False
-
-
False
-
No
-
No
-
-
-
Testable
I'm not sure whether this is truly an issue or not, but we've had reports from a customer that some images were difficult in pulling from the Mirror Image Registry. If it's a red herring, I'll close the Jira, but in the meantime it makes sense to capture the info here.
Some background:
Customer wants to experiment with a Custom runtime (Triton) in kserve, and an Ensemble Model. This is being done with RHOAI 2.5, not 2.8 right now.
We have prototype recipe which works well in a connected environment. https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing
From what was done at the customer site in the Disconnected environment ...
- the triton image needs to be mirrored (done) and the runtime refers to it explicitly so it can be done in the runtime's YAML (done)
- in spite of that, when pulling that image, unlike all the other ones, there are certs-related issues. (this is the same registry that's used for everything else, so it would point to Kserve missing some sort of CA bundle that everything else likely has).
- creating an imagestream pointing to it, and referring to the imagestream in the runtime's YAML eventually led to success.
- the kserve-storage-initializer image essentially caused issues that look somewhat similar.
-
- it is listed as one of the extra images: https://github.com/red-hat-data-services/rhoai-disconnected-install-helper/blob/main/rhods-2.5.md?plain=1#L12
- It was mirrored by the customer
- it fails to pull from the registry, but it's not obviously a Certs problem.
- it pulls fine when using podman etc..
- So that piece still does not resolve and the ensemble model cannot be pulled down from Object Storage.
I'm not sure, but I'm tempted to conclude that there is a way that Kserve defines things that is different from the rest of RHOAI.
I remember that initially, none of the RHOAI images were recognized, due to certs issues, and that the customer had to do something to make sure they would work.
Could it be that this method (pull-secret? other?) does not extend fully to kserve and its runtimes?
Then, some useful information from dzonca@redhat.com :
There are configmaps that drive the images used by kserve:
And indeed, in my connected RHOAI env, I get the following:
bash-4.4 ~ $ oc -n redhat-ods-applications get configmap inferenceservice-config -o yaml | grep \"image\" "image" : "quay.io/modh/kserve-agent@sha256:fa885ed04ea836d9ec2ae038a5d6721010566a2df6df3c615e4f3cefb14794d9", "image" : "quay.io/modh/kserve-agent@sha256:fa885ed04ea836d9ec2ae038a5d6721010566a2df6df3c615e4f3cefb14794d9", "image" : "quay.io/modh/kserve-agent@sha256:fa885ed04ea836d9ec2ae038a5d6721010566a2df6df3c615e4f3cefb14794d9", "image" : "quay.io/modh/kserve-router@sha256:3ce38cc18a92f35da98d371248a2d9d01b1d8f10ec94a6b94f8ec1922d436028", "image" : "quay.io/modh/kserve-storage-initializer@sha256:ae57d82e1fd85135dd257cf6f4c9f5dfe7ac92cd6d6bb634f79999bafbd602ac",
So, as step 1, we should check what that content looks like in the customer's environemend.
Note that editing the configmap values will get reconciled by the operator, so to test that theory, we'd need to stop it first.
- is related to
-
RHOAISTRAT-28 Support for product capabilities in a disconnected environment
- In Progress