Type: Feature Request
Resolution: Unresolved
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- Noobaa_NC

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

Target Version:

odf-4.22

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem description

AI RAG use cases require semantic search which is pushing data lakes to vectorize and index their data in vector databases.

Vector API is an added value of storage by automating and integrating the vector database as a cloud-like data service and simplifying the implementation of RAG.

Solution

Plugins

Embedding one vector database technology would be difficult because the space is still rapidly changing. There are many alternatives, which are on different stages, and have different characteristics:

	IBM Davinci DB	Enterprise LanceDB	Opensource LanceDB	Opensearch JVector
Integrated as	Service	Service	Shared Library (Rust)	Java Library
Protocol	GRPC	REST	Dynamic linking	JNI
Memory requirement	25-50 GB (bulk requires more)	25-50 GB	25-50 GB - per instance	??
Metadata schema	columns (flat)	??	??	??
Status	Development	GA	Very Active Projec: •165 contributors •several commits per day	Active Project: •27 contributors •several commits per week

This led us to design a plugin system so that we can connect the frontend S3 Vectors API to various backend implementations. Providing more freedom of choice for users is increasing complexity, but can be restricted by product, and is a good strategy for such cases when we do not have a clear view of which technology will win over.

We plan to use the existing Scale CES-S3 service and extend it to support S3-Vectors API, and not create a completely separate service, unless there is a strong reason to consider having a new Scale CES service. To use the existing Scale CES-S3 service, we might need more configuration options to control what users can do with the new APIs.

Terminologies for Timelines used further in this RFE:

TP = tech preview
1. Based on pre-release noobaa 4.21+
2. Main goal is to announce the feature, demo, and customer POC’s

P0 = GA (Oct 2026)
1. Based on ODF 4.22 - GA (July 2026)
P1 = next release(s)
1. TBD

ODF RFE questionnaire (https://issues.redhat.com/browse/ODFRFE )

Outline the proposed title of this feature request. Answer: Support S3 Vectors API {}{}
What is the nature and problem description of the request? Answer: See the problem description, solution above and the requirements list shared after the questions ======>{}
Why does the customer need this? (List the business requirements here) Answer:
1. AI RAG use cases require semantic search which is pushing data lakes to vectorize and index their data in vector databases.
2. Vector API is an added value of storage by automating and integrating the vector database as a cloud-like data service and simplifying the implementation of RAG. **
Are there any Documentation Requirements for this request? Answer:
1. (TP) no
2. (P0) yes, a user guide at the very basics **
Is the request coming from a specific customer or subset of customers (on prem only, cloud only, etc). Please don't mention any customer related sensitive information.
1. In general - IBM Storage enterprise customers
2. Early adopters – Financial services and Healthcare / Life science
3. Both cloud and on-premises.
What is the urgency of the request?
1. (TP) April 26
2. (P0) July 26
3. (P1) TBD
Please provide contact information, in case of follow up questions.
1. guym@ibm.com
2. madhu.punjabi@in.ibm.com

ODF / NooBaa Requirements

S3 Vector APIs
1. (TP) Include the basic APIs - put/get/list/query for bucket, index, and vectors AWS S3 Vectors API docs
2. ** Metadata filtering
  1. LanceDB supports a schema with columns
  2. S3 Vectors sends a schema-less json
  3. Need mapping of what can be done
  4. NOTE - LanceDB solution is applicable also for IBM Davinci DB
3. (P0) Include access control using bucket policy.
4. (P1) Tags for Vector Bucket/Index.
5. (P1) EncryptionConfiguration for Vector Bucket/Index.
Vector Bucket Management
1. (TP) The system config should specify which database plugin to use.
2. ** (P0) The account config with CLI to specify:
  1. Location for new vector buckets path
  2. If the account can create vector buckets
3. (P0) Create vector bucket with CLI – to allow connecting a vector bucket from data already existing on the filesystem.
4. (P1) Per bucket config should select the plugin to use – to allow mixing different databases on the same system.
5. (P1) Read-only vector bucket mode - for cases that S3 Vectors API is only for query (maybe trivial using bucket policy).
Vectors Plugin System
1. (TP) Opensource LanceDB plugin - baseline integration, functionally work, runs in CI, and easy to deploy (no license needed).
2. (TP) IBM DavinciDB plugin – claim to fame database, depends on external deployment, and connects through grpc api (involving CTO/Research).
3. (P1) Opensearch JVector plugin – TBD.
4. (P1) Dynamic loading of plugins – to allow adding plugins outside of the noobaa project.
Limits
1. Refer to https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html
2. (TP) Support 100 vector buckets (AWS limit is 10,000 per account-region)
3. (TP) Support 100 vector index (AWS limit is 10,000 per vector bucket)
4. (TP) TBD - Might have other lower limits than AWS
5. (P0) TBD – Depends on research input and POC feedback
Health checks and metrics
1. (P0) Have warning if vector buckets limit or vector index limit exceeded.
2. (P0) Warn if vector buckets path is not available.
3. (P1) Metrics for S3 Vector APIs ?

Outlook of requirements for Scale:

Install, upgrade
Integration - Health, performance monitoring
HA, failover-failback
Deployment of IBM Davinci DB services, allocated resources, etc.
Davinci Plugin for NooBaa -> ? (with assist from CTO / Research)
1. How to specify filesystem/fileset locations?

Questions to Noobaa:

Currently for P0(GA) the limits for vector buckets and vector index are specified as TBD as this information can be known after research's input and POC feedback. Would you be able to wait for it or you know tentative information now ?

Details

Description

Outlook of requirements for Scale:

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty