Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-11060

Wheels collector consumes excessive RAM due to in-memory accumulation

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Problem

      The wheels collection task (backend/collector/tasks/wheels.py) is experiencing two related issues:

      1. High RAM consumption: The collector accumulates all artifacts and drops in memory before committing to the database
      2. Late database commits: Database writes only happen at the very end of the task, meaning partial progress is lost if the collector crashes

      Root Cause

      The WheelsCollector.collect_wheels() method (in backend/collector/core/wheels_collector.py:491-573) accumulates ALL artifacts and drops in memory:

      all_artifacts = []
      all_drops = []
      h1. Loops through ALL releases, appending to these lists
      

      For products with hundreds of releases, each with multiple architectures, this creates thousands of objects in memory. Each artifact contains large JSON fields:

      • dependency_graph (full package dependency graph)
      • constraints_file (all packages with versions)
      • build_sequence_summary (build metadata)

      These fields can be 50-200KB+ per artifact, leading to significant memory consumption when processing all releases at once.

      Impact

      • OOM (Out of Memory) errors in the wheels-collector Job
      • Increased resource limits needed (currently 512Mi/1000m for wheels collectors)
      • Lost work if collector crashes before final commit
      • No partial progress saved during long-running collections

      Proposed Solution

      Implement batch processing by release instead of accumulating everything:

      1. Add a batch_callback parameter to collect_wheels() that commits artifacts/drops after each release
      2. Update sync_wheels_collections_task() to provide a callback that performs bulk writes per-release
      3. This reduces peak memory usage from "all releases" to "single release" (typically 4-8 artifacts per release for different architectures)
      4. Enables incremental commits so partial progress is saved

      Files to Modify

      • backend/collector/core/wheels_collector.py: Add batch_callback support
      • backend/collector/tasks/wheels.py: Implement per-release bulk write callback
      • backend/collector/tests/test_tasks.py: Update tests for new batch behavior

      Benefits

      • Reduced peak memory usage (10-50x reduction depending on number of releases)
      • Partial progress saved (database updated after each release)
      • Lower resource limits needed in dispatcher.py
      • More resilient to crashes and timeouts

              dhellman@redhat.com Doug Hellmann
              dhellman@redhat.com Doug Hellmann
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: