Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-18409

The "hammer export" command using single thread encryption causes a performance bottleneck.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • Pulp
    • Moderate

      Description of problem:

      We identified a severe bottleneck in the way hammer exports work and found the code in upstream Pulp.

      Line 406 of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/export.py

      "with tarfile.open(tarfile_fp, "w|gz", fileobj=split_process.stdin)"

      The tarfile Python module creates tar files using the gizp Python module.

      Data compression for the gzip Python module is provided by the zlib Python module.

      The zlib Python module calls the zlib library.

      If defaults are used the whole way through this series of events, the result is a single threaded pulp process doing compression of a tarball containing a massive content library. This bottleneck and can make large hammer exports take several days.

      Modifying the lines that tell the tarfile.open function to NOT use compression ( change "w|gz" to "w" ) dramatically speeds up the hammer export. In our testing it reduced the time from days to just hours. The drawback is the file size was significantly larger, but the trade-off is worthwhile given we have tight timeframes and plentiful disk capacity.

      Can this bottleneck be addressed with multi-threaded gzip compression?

      and/or

      Can a hammer command line option for no compression be implemented?

      Version-Release number of selected component (if applicable): 6.10 onwards ( Pulp 3 )

      How reproducible:
      Run a hammer export and monitor Pulp processes. One process with run at 100% CPU. Modify the abovementioned Python script to NOT use gzip encryption, and an uncompressed tarball will be created instead much quicker and with multiple Pulp processes.

      Steps to Reproduce:
      1. Run a "hammer export". Monitor the Pulp process CPU usage and time taken to complete export.
      2. Change the abovementioned Python code in Pulp.
      3. Run a "hammer export" again and note performance improvement.

      Actual results:
      A single threaded pulp process doing compression of a tarball and being bottlenecked.

      Expected results:
      Multi-threaded gzip compression that can take full advantage of the CPU and IO of the Satellite server and not be severly bottlenecked.

      Additional info:

      IO wait is very low when using single threaded compression, indicating that CPU single thread is the issue. When not using encryption (removing the bottleneck) iowait increases.

      This issue is causing significant delays for a couple of customers. Corresponding support tickets will be submitted soon.

        There are no Sub-Tasks for this issue.

            jira-bugzilla-migration RH Bugzilla Integration
            jira-bugzilla-migration RH Bugzilla Integration
            Shweta Singh Shweta Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: