Between discussions about the Java indexer and the Red Hat indexers, it's becoming clear that we need an indexer-side API for providing additional data to indexer processes. The need for this sort of data can't be papered over in the long term. Despite successfully pushing back on Red Hat Product Security's plans to introduce an additional metadata file, the existing ones will need to be used for the foreseeable future. The java issue had us run some numbers and realize that it wouldn't be unworkable to scrape maven repositories to generate jar lookup data. For a single mechanism to be useful in both cases, it must be more complicated than the previous plan for storing indexer data centrally.
Requirements:
- Must support arbitrary data shapes – assume the data cannot be fully normalized
- Must support key lookups
- Must be able to be incrementally updated
Wants:
- Some amount of the data will be repeated (Maven (group, artifact) pairs, Red Hat product names) – being able to intern some values would be a space savings.
- Some sort of query ability (jsonpath?)
Antirequirements:
- Use of PostgreSQL-only features (for sqlite, emulation via user functions may be OK)
- Querying implemented wholly process-side
- Only "snapshot"-style updates to data
This should be done in a few broad steps:
- Design and implement the Indexer AdditionalData database API
- Design and implement an IndexerUpdater API.
- Implement the API to provide the AdditionalData API to "scanner" implementations.
- Implement needed IndexerUpdater implementations.
- Update necessary "scanners" to use the new API.
- blocks
-
CLAIRDEV-102 Clair ships a Maven offline index
- Refinement
- relates to
-
CLAIRDEV-46 Update Offline-Mode Import/Export
- Refinement
-
CLAIRDEV-93 java: central search can be load-bearing
- To Do