-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
quay-v3.17.0
-
None
-
False
-
-
False
-
-
Issue Summary
The organization-level mirror repository discovery feature (PROJQUAY-10040) contains a critical N+1 query performance bug that executes 2000+ database queries when discovering 1000 repositories, causing significant performance degradation and potential database connection pool exhaustion.
Affected Code
File: data/model/org_mirror.py:416-441
Function: sync_discovered_repos()
Version: Quay 3.17+
Technical Details
The current implementation loops through discovered repositories and calls get_or_create_org_mirror_repo() for each repository individually:
def sync_discovered_repos(config: OrgMirrorConfig, discovered_names: List[str]): newly_created = 0 with db_transaction(): for repo_name in discovered_names: # Loop over potentially 1000+ repos _, created = get_or_create_org_mirror_repo(config, repo_name) # Individual query per repo if created: newly_created += 1 return len(discovered_names), newly_created
Each get_or_create_org_mirror_repo() call executes:
- SELECT to check if repository exists (1 query)
- INSERT if repository doesn't exist (1 query)
Query Count: For N discovered repositories = 2N database queries
Impact Analysis
Severity: High
Performance Impact:
- 100 repos = 200 queries
- 1,000 repos = 2,000 queries
- 10,000 repos = 20,000 queries
Consequences:
- Database connection pool exhaustion
- Slow discovery phase (minutes instead of seconds)
- Transaction holds database locks for extended periods
- Worker timeout failures for large organizations
- Inability to mirror large Harbor projects or Quay organizations
Steps to Reproduce
- Configure org-level mirror for a Harbor project with 1000+ repositories
- Enable FEATURE_ORG_MIRROR
- Trigger discovery phase (wait for scheduled sync or use sync-now)
- Monitor database query logs:
-- Enable query logging in PostgreSQL ALTER DATABASE quay SET log_statement = 'all'; ALTER DATABASE quay SET log_min_duration_statement = 0;
- Observe 2000+ SELECT/INSERT queries for 1000 repositories
- Check Prometheus metrics: quay_org_mirror_discovery_duration_seconds will show slow performance
Expected Behavior
Discovery should use batch operations:
- 1 SELECT query to fetch existing repositories
- 1 INSERT query to bulk-create new repositories
- Total: 2 queries regardless of repository count
Actual Behavior
Discovery executes individual queries:
- N SELECT queries (one per repository)
- N INSERT queries (one per new repository)
- Total: 2N queries (scales linearly with repository count)
Proposed Fix
Replace loop-based approach with batch operations:
def sync_discovered_repos(config: OrgMirrorConfig, discovered_names: List[str]): """ Optimized batch insert version. Reduces query count from O(2n) to O(2) regardless of repository count. """ if not discovered_names: return 0, 0 with db_transaction(): # BATCH QUERY: Fetch all existing repos in one SELECT existing_names = set( OrgMirrorRepository .select(OrgMirrorRepository.repository_name) .where( (OrgMirrorRepository.org_mirror_config == config) & (OrgMirrorRepository.repository_name.in_(discovered_names)) ) .scalar() ) # Filter out existing repos new_repos = [name for name in discovered_names if name not in existing_names] # BATCH INSERT: Bulk insert all new repos in one INSERT if new_repos: now = datetime.utcnow() OrgMirrorRepository.insert_many([ { 'org_mirror_config': config, 'repository_name': name, 'discovery_date': now, 'sync_status': OrgMirrorRepoStatus.NEVER_RUN, 'creation_date': now, 'sync_retries_remaining': MAX_SYNC_RETRIES, } for name in new_repos ]).execute() return len(discovered_names), len(new_repos)
Performance Improvement:
- Before: O(2n) queries for n repositories
- After: O(2) queries regardless of repository count
- 1000 repos: 2000 queries → 2 queries (1000x improvement)
Testing Requirements
- Unit test with 1000+ mock repositories
- Integration test measuring query count
- Performance benchmark comparing old vs new implementation
- Regression test ensuring existing repositories are not duplicated
Acceptance Criteria
- [ ] sync_discovered_repos() executes maximum 2 database queries regardless of repository count
- [ ] Existing repositories are not duplicated
- [ ] Discovery phase completes in <5 seconds for 1000 repositories
- [ ] All existing unit tests pass
- [ ] New performance tests added
Related Issues
- Parent: PROJQUAY-10040 (Organization-Level Repository Mirroring)
- Discovered during: Security and performance audit (2026-02-26)
References
- Code: data/model/org_mirror.py:416-441
- Worker: workers/repomirrorworker/_init_.py:perform_org_mirror_discovery()
- Migration: data/migrations/versions/a1b2c3d4e5f6_add_org_mirror_tables.py