Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-39226

High memory usage of postgres processes on scaled Capsule

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • sat-artemis
    • None
    • None
    • None
    • None

      Description of problem:
      On a scaled Capsule with many big repos synces (many pulp published artifacts, in fact), a Capsule sync can trigger sudden peak in memory consumption that can end up even in OOM killer intervention.

      It happens when multiple big repos (as katello orders repo sync per the repo size) are synced and (automatically) published at the same time, each triggering PublishedArtifact INSERT of all repo artifacts in one bulk. That concurrently bumps memory usage of multiple postgres client processes (not pulp workers).

      How reproducible:
      100% (the more you scale your env, the more visible)
       

      Is this issue a regression from an earlier version:
      no
       

      Steps to Reproduce:

      1. Sync RHEL8 AppStream repo (or any other repo with >40k packages), even with On Demand policy.

      2. Have 100 CV/LE combinations (e.g. 20 CVs with 5 LEs each) containing that repo.

      3. Assign all the LEs to a Capsule and sync the Capsule.

      4. Add 1-2 more same CVs and sync the Capsule - such that more new repos are synced there than the number of workers on Capsule is (e.g. having 8 workers there and 5 LEs, create 2 CVs to ensure 10 repos (> 8 workers) is synced). Simply, we need to make all workers busy on equivalent task at the same time.

      5. During the Caps Sync, monitor `ps aux; free` outputs every second on the Capsule.

      Actual behavior:
      5. At some time (when sync itself finished and repo is being internally published):

      • `free` shows a memory usage bump (e.g. by 2GBs)
      • `postgres` client proceses bump in memory usage (by 200-250MBs each)
      • the `postgres` processes are in the middle of an INSERT, like:
      postgres 1399718  3.0  2.3 877808 774176 ?       Rs   08:05   0:11 postgres: pulp pulpcore ::1(54602) INSERT
      

      Expected behavior:
      5. No such bump of memory consumption. `postgres` processes consume steady amount of memory.

      Business Impact / Additional info:
      This lead to OOM and/or postgres rejecting pulp workers with out of memory errors.

      Patch is trivial:

      https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/synchronizing.py#L211 needs to call the `bulk_create` method with `batch_size=2000`, like https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/tasks/publishing.py#L217 does.

              Unassigned Unassigned
              rhn-support-pmoravec Pavel Moravec
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: