Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- memory
- performance

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem

The wheels collection task (backend/collector/tasks/wheels.py) is experiencing two related issues:

High RAM consumption: The collector accumulates all artifacts and drops in memory before committing to the database
Late database commits: Database writes only happen at the very end of the task, meaning partial progress is lost if the collector crashes

Root Cause

The WheelsCollector.collect_wheels() method (in backend/collector/core/wheels_collector.py:491-573) accumulates ALL artifacts and drops in memory:

all_artifacts = []
all_drops = []
h1. Loops through ALL releases, appending to these lists

For products with hundreds of releases, each with multiple architectures, this creates thousands of objects in memory. Each artifact contains large JSON fields:

dependency_graph (full package dependency graph)
constraints_file (all packages with versions)
build_sequence_summary (build metadata)

These fields can be 50-200KB+ per artifact, leading to significant memory consumption when processing all releases at once.

Impact

OOM (Out of Memory) errors in the wheels-collector Job
Increased resource limits needed (currently 512Mi/1000m for wheels collectors)
Lost work if collector crashes before final commit
No partial progress saved during long-running collections

Proposed Solution

Implement batch processing by release instead of accumulating everything:

Add a batch_callback parameter to collect_wheels() that commits artifacts/drops after each release
Update sync_wheels_collections_task() to provide a callback that performs bulk writes per-release
This reduces peak memory usage from "all releases" to "single release" (typically 4-8 artifacts per release for different architectures)
Enables incremental commits so partial progress is saved

Files to Modify

backend/collector/core/wheels_collector.py: Add batch_callback support
backend/collector/tasks/wheels.py: Implement per-release bulk write callback
backend/collector/tests/test_tasks.py: Update tests for new batch behavior

Benefits

Reduced peak memory usage (10-50x reduction depending on number of releases)
Partial progress saved (database updated after each release)
Lower resource limits needed in dispatcher.py
More resilient to crashes and timeouts

mentioned on

Merge request - AIPCC-11060: Implement pagination for wheels collector to reduce memory usage

Solved by commit 6e55b84a0fb2d121de23da6619d5348982839fa9.

Assignee:: Doug Hellmann

Reporter:: Doug Hellmann

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/02/25 12:27 PM

Updated:: 2026/02/25 2:13 PM

Resolved:: 2026/02/25 2:13 PM

Details

Description

Problem

Root Cause

Impact

Proposed Solution

Files to Modify

Benefits

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty