Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-19978

The "hammer export" command using single thread encryption causes a performance bottleneck.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Pulp
    • Moderate

      +++ This bug was initially created as a clone of Bug #2188504 +++

      Description of problem:

      We identified a severe bottleneck in the way hammer exports work and found the code in upstream Pulp.

      Line 406 of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/export.py

      "with tarfile.open(tarfile_fp, "w|gz", fileobj=split_process.stdin)"

      The tarfile Python module creates tar files using the gizp Python module.

      Data compression for the gzip Python module is provided by the zlib Python module.

      The zlib Python module calls the zlib library.

      If defaults are used the whole way through this series of events, the result is a single threaded pulp process doing compression of a tarball containing a massive content library. This bottleneck and can make large hammer exports take several days.

      Modifying the lines that tell the tarfile.open function to NOT use compression ( change "w|gz" to "w" ) dramatically speeds up the hammer export. In our testing it reduced the time from days to just hours. The drawback is the file size was significantly larger, but the trade-off is worthwhile given we have tight timeframes and plentiful disk capacity.

      Can this bottleneck be addressed with multi-threaded gzip compression?

      and/or

      Can a hammer command line option for no compression be implemented?

      Version-Release number of selected component (if applicable): 6.10 onwards ( Pulp 3 )

      How reproducible:
      Run a hammer export and monitor Pulp processes. One process with run at 100% CPU. Modify the abovementioned Python script to NOT use gzip encryption, and an uncompressed tarball will be created instead much quicker and with multiple Pulp processes.

      Steps to Reproduce:
      1. Run a "hammer export". Monitor the Pulp process CPU usage and time taken to complete export.
      2. Change the abovementioned Python code in Pulp.
      3. Run a "hammer export" again and note performance improvement.

      Actual results:
      A single threaded pulp process doing compression of a tarball and being bottlenecked.

      Expected results:
      Multi-threaded gzip compression that can take full advantage of the CPU and IO of the Satellite server and not be severly bottlenecked.

      Additional info:

      IO wait is very low when using single threaded compression, indicating that CPU single thread is the issue. When not using encryption (removing the bottleneck) iowait increases.

      This issue is causing significant delays for a couple of customers. Corresponding support tickets will be submitted soon.

      — Additional comment from on 2023-05-25T14:25:50Z

      The default compression level of tarfile with gzip compression is level 9, the highest and most computationally intensive level. Possibly compression would be more viable at a lower level - based on various benchmarks level 3 ought to be about 4x faster than level 9 but with a compression level only 15-20% worse.

      — Additional comment from on 2023-05-25T14:54:28Z

      Brendan, don't forget to attach the customer cases.

      If you have a reproducer (or a customer willing to experiment), what is the impact of adding "compresslevel=1" to that line? e.g.

      "with tarfile.open(tarfile_fp, "w|gz", compresslevel=1, fileobj=split_process.stdin)"

      — Additional comment from on 2023-05-25T14:59:25Z

      (don't forget to restart the services, of course).

      Also, how large is the uncompressed export in question, in gigabytes?

      — Additional comment from on 2023-05-30T04:20:27Z

      Hi Daniel,

      We have the ability to test this with a customer so I will try this out and report back. From memory, we were dealing with an export over a terabyte in size, but I will have to confirm that.

      Thanks

      — Additional comment from on 2023-05-30T04:33:17Z

      Brendan, I thought I had posted this but apparently not - that patch actually will not work, because it requires code present in Python 3.12 only. Please don't ask the customer to try it just yet.

      — Additional comment from on 2023-06-13T12:05:49Z

      The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

      — Additional comment from on 2023-06-14T14:42:15Z

      Bwood, don't forget to attach the customer cases.

      — Additional comment from on 2023-07-27T14:34:46Z

      Anyone tracking this may also be interested in https://bugzilla.redhat.com/show_bug.cgi?id=2226950

      — Additional comment from on 2023-08-22T10:04:36Z

      @dalley@redhat.com I was trying to verify this issue but the export becomes very slow after 90% is completed. Any reason for this behaviour? I don't think the issue is completely fixed.

      — Additional comment from on 2023-08-22T14:18:24Z

      Are you comparing it against a baseline export time, or just looking at reported progress? What does "very slow" mean in this context?

      IMO the way to test this would be to start with an impacted version, sync and do a complete export, note the time it takes, upgade and export the same repo, noting the new time required.

      — Additional comment from on 2023-08-23T10:16:47Z

      Thanks Daniel, I was looking at the export progress but if we compare it with export time of impacted version, the export time has been improved.
      So, I am going to verify the issue.

      — Additional comment from on 2023-08-23T10:21:53Z

      Verified.

      Version Tested: Satellite 6.14.0 Snap 12.0

      Verification Steps:
      1. Enable some large repos like appstream and rhel7server.
      2. Update the download policy to "immediate" and sync the repos.
      3. Perform complete hammer export of the library lifecycle environment.
      4. Observe the time to export the complete lce.

      Result:
      Performance of hammer export has been improved after the fix.

      — Additional comment from on 2023-08-23T13:13:08Z

      Shweta, out of curiosity, how much improvement did you observe?

      — Additional comment from on 2023-09-05T00:51:40Z

      I've unfortunately needed to revise the patch as it was not being reliably loaded 100% of the time. I don't believe there is any need to delay any scheduled releases, just push the BZ off to the next one.

            jira-bugzilla-migration RH Bugzilla Integration
            jira-bugzilla-migration RH Bugzilla Integration
            Shweta Singh Shweta Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: