-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
quay-v3.17.0
-
False
-
-
False
-
-
Summary
The organization mirror discovery phase has a hard 30-minute claim expiry (MAX_DISCOVERY_DURATION). For large Harbor projects (e.g. 10,000 repositories), the discovery process requires 100+ paginated HTTP requests each subject to a 30-second network timeout. Under load or latency, the total discovery time can exceed 30 minutes, causing the claim to expire mid-discovery. The next worker run then re-claims the config and restarts discovery from scratch, potentially entering an infinite restart loop that never completes.
Affected Files
- data/model/org_mirror.py:862 — MAX_DISCOVERY_DURATION = 60 * 30 (30 minutes)
- util/orgmirror/harbor_adapter.py:68 — paginated HTTP loop with per-request timeout=30s
Bug Details
When an org mirror config is claimed for discovery, an expiration timestamp is set:
# data/model/org_mirror.py:862 MAX_DISCOVERY_DURATION = 60 * 30 # 30 minutes # claim_org_mirror_config() expiration_date = now + timedelta(seconds=MAX_DISCOVERY_DURATION)
The Harbor adapter fetches repositories page by page:
# util/orgmirror/harbor_adapter.py params = {"page": page, "page_size": self.page_size} # default page_size=100 response = self.session.get(url, params=params, timeout=self.timeout) # default timeout=30s
For a Harbor project with 10,000 repositories:
- Pages required: 10,000 / 100 = 100 paginated HTTP requests
- Worst-case per-request time: 30 seconds (full timeout)
- Worst-case total time: 100 × 30s = 50 minutes
This exceeds MAX_DISCOVERY_DURATION (30 minutes) by 20 minutes.
When the expiry is detected by the next worker run, expire_org_mirror_config() resets the config to NEVER_RUN and restarts discovery from page 1. If Harbor remains under load, every subsequent discovery attempt also times out, creating an infinite restart loop where discovery never completes.
Impact
- Organization mirror discovery never completes for large Harbor projects under network load
- No repositories are ever queued for sync
- The failure is silent: the config is reset and retried without alerting the operator
- The worker continuously consumes CPU and makes repeated paginated API calls to Harbor without making forward progress
- Affects any Harbor project where: num_repos / page_size × per_request_latency > MAX_DISCOVERY_DURATION
Reproduction Conditions
- Source registry type: Harbor
- Harbor project with a large number of repositories (e.g. ≥10,000)
- Harbor registry experiencing elevated response latency (≥1.8s average per request for 10,000 repos to hit the 30-minute limit; any latency for larger projects)
- ORG_MIRROR_INTERVAL is shorter than the actual discovery time
Expected Behavior
Discovery of a large Harbor project should complete successfully regardless of project size, or fail with a clear operator-visible error after a reasonable number of attempts.
Actual Behavior
Discovery exceeds its 30-minute claim window, the claim expires, the config is reset to NEVER_RUN, and the next worker run restarts discovery from scratch. This loop repeats indefinitely under sustained Harbor load.
Additional Context
- The 30-minute MAX_DISCOVERY_DURATION is a hard-coded constant with no configuration override.
- The default HTTP request timeout is 30 seconds per page, also not configurable at the per-adapter level.
- The Harbor adapter has no resumption capability: every discovery run starts from page 1.
- The theoretical worst-case scales linearly: a 100,000-repo project requires 1,000 paginated requests, making the limit unreachable even under ideal network conditions (1,000 × minimum HTTP RTT).
- Related: PROJQUAY-10877 (Harbor pagination may stop at first page if Link:next header is absent)