-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
Feature Request
We want clients to be able to serve IBM geospatial foundation models e.g. Prithvi, which are a class of Vision Transformers, in RedHat Inference Server.
Compared to LLMs, Vision Transformers take images as input, often with no-text, and generate images as output. vLLM requires extensions in server and engine to support these models.
The enablement of the geospatial models in vLLM is already underway by our team at IBM Research.
This feature request covers specific enhancements to RedHat Inference Server to ensure that clients with RHAI can use it to serve the geospatial models.
Note: This request could also be expanded to cover a wider set of Vision Transformer models.
Requirements:
Some items we think this feature could require are:
- A RedHat Inference Server image built with the extra requirements to support the geospatial models e.g. terratorch
- A RedHat Inference Server built on the vLLM version including the required PRs → we target merging all required PRs in vLLM main by end August
- Documentation describing how to perform inference on geospatial models with RedHat Inference Server
- Testing of RedHat inference Server releases against key geospatial models (some testing already takes place in vLLM)
Business Drivers
IBM has produced world-leading GenAI models for geospatial analytics, for example the IBM-NASA Prithvi models and TerraMind. Multiple current and potential IBM customers are hence looking to IBM to support deploying these models for use in their domains. This includes government (e.g., NASA, US Dept of Energy, European Space Agency, UK Government, Government of India, UAE Ministry of Climate Change and Environment), defense (e.g., Lockheed Martin, Defence Research and Development Canada, UK Ministry of Defense), energy (e.g., Shell, Exxon), agriculture (e.g., Bayer, PepsiCo), insurance (e.g., State Farm, SwissRe, TokioMarine), academia (e.g., University of Singapore), and more.
Since the IBM generative-ai stack is built on RedHat AI, to create an offering or solution that uses these geospatial models, it’s necessary that these models are supported for inference via RedHat AI. Without this, customers would have to create their own vLLM server images and would not have the support and reliability guarantees that come from using a RedHat product. Further, IBM partners would not be able to provide solutions to customers requesting these geospatial models as the required functions are not associated with any IBM/RH product.
Technical Context
Terratorch is a python package from IBM for simplifying creating and fine-tuning geospatial models which often have complex input and output processing steps.
The current PRs for supporting Geospatial models specifically and ViTs in general are merged or open
- Add support for embedding models in v1 engine (https://github.com/vllm-project/vllm/pull/16188)
- Add support for attention free models : (https://github.com/vllm-project/vllm/pull/20811 )
- Implement Prithvi in V1 engine including tests: (https://github.com/vllm-project/vllm/pull/20577)
- Enable vLLM server pooling endpoint to serve models that do not initialize a tokenizer (enables Prithvi serving) (https://github.com/vllm-project/vllm/pull/21518,)
- Enable MultiModal output processors in vLLM Engine (https://github.com/vllm-project/vllm/pull/20811 , open)
In next few weeks we will add serving of image-in (no text)/image-out inference