Loading...

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: quay
Labels:
- 3.18-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Intelligence Requested:
Market:

Feature Overview (aka. goal summary)

Enable Quay to generate organization-level vulnerability reports that aggregate security findings across all repositories within an organization.

Currently, organizations must manually collect vulnerability data repository-by-repository. This is inefficient and error-prone for enterprises managing thousands of container images. This feature addresses critical enterprise security compliance workflows by providing a consolidated view of organizational risk posture.

The solution will:

Provide aggregated vulnerability summaries at the organization level, filterable by severity.
Enable detailed CSV/JSON exports containing CVE identifiers, affected images, tags, and package details.
Support asynchronous report generation to prevent performance impact on live registry operations.
Deliver reports via both REST API and Web UI.

Customer Impact: Strategic accounts have identified this as a critical gap blocking their security compliance workflows.

Goals (aka. expected user outcomes)

The primary goal is to enable enterprise customers to efficiently assess and report on vulnerability posture across their entire organization without manual data scraping.

This feature empowers users to:

View Organization-wide Vulnerability Summary: Quickly identify which repositories contain Critical/High severity vulnerabilities without navigating to each repository individually.

Export Detailed Reports: Generate CSV/JSON exports for integration with external security platforms (SIEM, ticketing systems).

Prioritize Remediation: Use severity-based filtering to focus resources on the highest-risk images.

Automate Security Workflows: Integrate vulnerability data exports into CI/CD pipelines via API.

Satisfy Compliance: Generate auditable security reports for SOC 2 and PCI-DSS frameworks.

Background

Customer Problem Statement

Security teams responsible for thousands of container images lack a centralized view of security vulnerabilities. Currently, they must manually navigate to each repository, view reports per image, and aggregate results in spreadsheets. This process is time-consuming, error-prone, and unscalable.

Strategic Customer Requirements

1) Requires programmatic access to export vulnerability data (CVE names, image tags, severity) to feed upstream security scanners.
2) Requires organization-level summaries to identify repositories with Critical/High CVEs for prioritization.

Competitive Landscape

JFrog Xray & Google Artifact Registry already provide organization-level aggregation and export APIs.

Requirements (aka. acceptance criteria)

Phase 1: Organization Vulnerability Summary API (MVP)

The API provides an organization-level vulnerability summary endpoint: GET /api/v1/organization/{orgname}/vulnerabilities/summary

The summary returns aggregated counts by severity level (Critical, High, Medium, Low, Unknown).

The summary includes a list of repositories with Critical and/or High severity vulnerabilities.

Responses include metadata: "total repository count", "total image count", "last updated timestamp", and "generation time".

Summary data is cached with a configurable TTL to balance freshness and performance.

Users can trigger on-demand refresh with appropriate rate limiting to prevent abuse.

The API supports severity filtering via query parameter (?severity=critical,high).

A background worker regenerates summaries periodically for all organizations with recent activity.

The feature is permission-gated: requires organization read access (existing org:view permission).

Performance target for cached summaries should provide near-instant response times.

Phase 2: Repository-Level Vulnerability Reports

The API provides a repository-level vulnerability summary endpoint: GET /api/v1/repository/{namespace}/{repository}/vulnerabilities/summary

Repository summaries include per-image vulnerability counts with manifest digests and associated tags.

The organization summary endpoint includes repository-level detail when requested via query parameter (?detail=repositories).

Repository summaries support the same caching and refresh mechanisms as organization summaries.

Phase 3: Detailed Vulnerability Export with CSV/JSON

The API provides an export request endpoint: POST /api/v1/organization/{orgname}/vulnerabilities/export

Export requests accept parameters: format (csv, json), severity filter, repositories filter, include_fixed boolean.

Export requests return an export ID and initial status (pending).

Export generation is asynchronous via a background worker.

Users query export status via: GET /api/v1/organization/{orgname}/vulnerabilities/export/{export_id}

Completed exports provide a download URL: GET /api/v1/organization/{orgname}/vulnerabilities/export/{export_id}/download

Download URLs use pre-signed URLs (S3-compatible storage) or secure tokens with time-based expiration.

Export files are auto-deleted after a defined retention period to manage storage costs and security exposure.

CSV exports include required columns: "CVE ID", "Severity", "Repository", "Image Digest", "Image Tags" (e.g., tag1|tag2|tag3), "Package Name", "Current Version", "Fixed Version" (if available), and "Published Date".

JSON exports provide a structured format with nested objects for images, vulnerabilities, and packages.

Export generation has size limits to prevent resource exhaustion (users must filter to reduce scope for very large organizations).

Export generation has rate limits to prevent abuse and manage backend load.

Export requests are permission-gated: requires organization admin access (org:admin permission).

Performance target: Export generation should complete in a reasonable time for typical enterprise organizations.

Phase 4: Web UI Integration

The organization dashboard displays a "Vulnerabilities" tab showing the summary view.

The vulnerabilities tab shows top repositories by critical CVE count for quick identification.

The UI provides filter controls for severity levels and date ranges.

The UI includes an "Export" button that opens a modal for export configuration (format, filters).

The UI displays export status with progress indication (pending, processing, completed, failed).

The UI provides a "Download" button for completed exports.

The UI shows export history for recent exports with download access for unexpired exports.

Repository detail pages include a "View in Organization Report" link.

Open Questions

The following operational parameters require discussion with the engineering team to determine appropriate values based on performance testing, capacity planning, and customer requirements:

Caching and Performance

Q1. Summary Cache TTL
- What is the appropriate cache duration for organization vulnerability summaries?
- Considerations: CVE database update frequency, backend load, user expectation for "fresh" data
- Customer context: T and F expect "current" data but likely review daily or weekly
- Competitive reference: JFrog Xray and GitLab cache for 12-24 hours
- Engineering input needed: What cache duration balances load vs. freshness?

Q2. Cached Response Time Target
- What response time target should we set for cached summary queries?
- Considerations: Redis performance characteristics, API middleware overhead, user experience expectations
- Technical note: Industry standard is <100ms for "instant" feel, <300ms is acceptable with spinner
- Engineering input needed: Is sub-100ms achievable with our current Redis infrastructure?

Q3. Background Refresh Schedule
- How frequently should we regenerate organization summaries in the background?
- Options: Hourly, nightly, twice daily
- Considerations: Off-peak processing, CVE database update patterns, user workflow (morning reviews)
- Engineering input needed: What frequency is sustainable for our worker infrastructure?

Rate Limiting

Q4. On-Demand Summary Refresh Rate Limit
- How often should users be allowed to force-refresh organization summaries?
- Use case: "Just fixed critical CVEs and rescanned; want updated summary now"
- Considerations: Summary generation cost, prevention of abuse, legitimate retry scenarios
- Engineering input needed: What rate limit prevents abuse while supporting emergency refreshes?

Q5. Export Request Rate Limit
- How frequently should organizations be allowed to request exports?
- Customer context: F may automate exports for upstream integration
- Considerations: Background worker capacity, typical export completion time, retry scenarios
- Competitive reference: GitLab allows 1 active export per user at a time
- Engineering input needed: What rate limit balances automation needs with resource protection?

Export Size and Scalability

Q6. Maximum Export Record Count
- What is the maximum number of vulnerability records per export?
- Scale context:
  - Mid-size org: 500 repos × 20 images × 50 CVEs = 500,000 records (unfiltered)
  - With critical/high filtering: ~100,000 records (20% of total)
- Considerations: Worker memory limits, CSV file usability (Excel), generation timeout risk
- Customer validation needed: Does this limit cover F's expected critical/high CVE count?
- Engineering input needed: What record count can workers handle within timeout windows?

Q7. Export Generation Timeout
- What is the maximum time allowed for export generation before timeout/failure?
- Considerations: Clair API throughput, typical org size, user patience
- Performance goal: What completion time should we target for a "typical" enterprise org (e.g., 5,000 images)?
- Engineering input needed: What timeout is reasonable given Clair API capacity?

Data Retention and Storage

Q8. Export File Retention Period
- How long should completed exports be retained before auto-deletion?
- Considerations: Storage costs, security exposure (aggregated vuln data), user download patterns
- Competitive reference: GitLab retains standard exports for limited time, 30 days for archived
- User workflow: Most users download immediately or within 24 hours
- Engineering input needed: What retention period balances storage costs with user convenience?

Q9. Download URL Expiration
- How long should pre-signed download URLs remain valid?
- Considerations: Security (prevent URL sharing), user workflow (may need to re-download)
- Engineering input needed: What expiration window balances security with usability?

User-Specific Questions

Q10. Automation Frequency
- How often do users plan to automate vulnerability exports?
- Options: Hourly, daily, weekly
- Impact: Determines whether our rate limits accommodate their use case
- Customer validation needed

Q11. Expected Scale
- What is the user's expected vulnerability count with critical/high filtering?
- Context: Determines if 100K record limit is sufficient
- Customer validation needed

Q12. Dashboard Requirements
- How many "top repositories" should the summary show?
- Current proposal: Top 10 by critical CVE count
- Customer validation needed

Q13. Permission Model Validation
- Should exports be admin-only, or allow org readers?
- Current proposal: Summary (org:view), Export (org:admin)
- Rationale: Exports are data exfiltration risk; summaries support transparency
- Customer validation needed

Technical Architecture

Q14. Worker Capacity Planning
- How many concurrent export jobs can our worker infrastructure support?
- Context: With 100 active organizations and rate limits, estimate maximum queue depth
- Engineering input needed: Current worker capacity and scaling options

Q15. Clair API Rate Limits
- What are Clair's API throughput limits and how do they affect export generation time?
- Context: Determines feasibility of performance targets
- Engineering input needed: Clair API capacity and throttling behavior

Documentation considerations

Document the API endpoints with request/response examples.

Provide usage examples for common workflows, e.g.,
- "Generate a CSV export of Critical vulnerabilities"
- "Query organization summary via curl"
- "Filter by severity and repositories"

Document performance characteristics and operational limits (cache TTL, rate limits, size limits, etc) once determined.

Explain the caching behavior and refresh mechanisms:
- Summary cache TTL
- On-demand refresh rate limits
- Background refresh schedule

Document required permissions:
- Organization read access for the summary API
- Organization admin access for export generation

Provide CSV format specification with column definitions and example rows.

Document export lifecycle:
- Asynchronous generation process
- Status polling workflow
- Auto-deletion policy

Include troubleshooting guidance:
- "Export failed" scenarios (timeout, size limit exceeded)
- Performance optimization tips (use severity filters)
- Empty results troubleshooting (no scans completed)

is related to

PROJQUAY-10702 Spike: Organization Vulnerability Report — data architecture, Clair scalability, and open product questions

New

is triggered by

RFE-4326 Create an Organization vulnerability report

Approved

Details

Description