Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-38301

performance degradation of "dnf reposync" from Satellite 6.15 to 6.16

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 6.16.5
    • Performance, Pulp
    • False
    • sat-artemis
    • None
    • None
    • None
    • None
    • Yes

      Description of problem:
      There is an evident performance degradation when a client registered to Satellite 6.16 (or 6.17) runs "dnf reposync" or a similar operation that fetches thousands of packages.

      The issue happens when serving locally stored packages (i.e. even with Immediate download policy).

      The regression happens between 6.15 and 6.16, while 6.17 performance is similar to 6.16.

      Dissecting where the delay comes from, I measured response times of both httpd and of `stream_content` method https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L279 as described below.

      When I did a dirty copy of the `handler.py` from 6.15 to 6.16 and 6.17, performance got improved a lot, almost to the 6.15 values. This proves the changes in handler.py cause the regression.

      How reproducible:
      100% n proper testing
       

      Is this issue a regression from an earlier version:
      yes, between 6.15 and 6.16
       

      Steps to Reproduce:

      0) use three Satellites, 6.15 (on RHEL8), 6.16 (on RHEL9) and 6.17 (on RHEL9), and three identical clients (RHEL9). Satellites used `large` tuning, otherwise default install.

      1) In `/etc/httpd/conf/httpd.conf`, append to `LogFormat` values `%D`, to print response time in microseconds (and restart httpd service)

      2) In `/usr/lib/python3.11/site-packages/pulpcore/content/handler.py`, apply patch:

      --- /usr/lib/python3.11/site-packages/pulpcore/content/handler.py.orig	2025-09-08 08:48:59.369427549 +0200
      +++ /usr/lib/python3.11/site-packages/pulpcore/content/handler.py.617debugs	2025-09-09 08:31:50.713746381 +0200
      @@ -1,3 +1,4 @@
      +import time
       import asyncio
       import logging
       from multidict import CIMultiDict
      @@ -286,8 +287,12 @@ class Handler:
                   [aiohttp.web.StreamResponse][] or [aiohttp.web.FileResponse][]: The response
                       back to the client.
               """
      +        start_time = time.time()
               path = request.match_info["path"]
      -        return await self._match_and_stream(path, request)
      +        ret = await self._match_and_stream(path, request)
      +        end_time = time.time()
      +        log.info(f"PavelM: stream_content of {path=} in time {end_time-start_time}")
      +        return ret
       
           @staticmethod
           def _base_paths(path):
      

      to measure response time of pulp (and restart pulpcore-content service).

      3) To ensure `redis` does not cache anything, stop the service (it should not cache much packages since cnd reposync fetches whole repo content which is far beyond of redis cache, but dont have our tests biased by redis).

      4) Run either `dnf reposync`, or the same in sequential mode (`echo max_parallel_downloads=1 >> /etc/dnf/dnf.conf`), or iterate `curl` to fetch each and every package of a repo (generate the list of packages from dnf metadata and get SSL certs from redhat.repo). In either case (dnf / curl), results were similar.

      5) Check these three values:

      • sum of pulp response times per changes in handler.py
      • sum of httpd response times per httpd access logs
      • runtime of the client (dnf or bash script running curl commands)

      Actual behavior:
      TL;DR: 6.16 is worse by 20ish percents than 6.15, while 6.17 is yet bit slower by a few percents - depending what metric you focus on. Copying handler.py from pulp in 6.15 to 6.16 and 6.17, most of the difference is gone.

      I run each scenario 5 times and put just median value each time. In each scenario, I calculated sum of resposnse times in seconds. RHEL9 BaseOS repo was used here, while similar outcome was seen for RHEL8 BaseOS repo (and almost surely exist for any other repo).

      Scenario 6.15 pulp 6.15 httpd 6.15 client 6.16 pulp 6.16 httpd 6.16 client 6.17 pulp 6.17 httpd 6.17 client
      default scenario, real values 146.71 562.90 837.00 212.62 641.56 940.00 222.87 673.07 962.00
      default scenario, values relative to 6.15 1.00 1.00 1.00 1.45 1.14 1.12 1.52 1.20 1.15
      handler.py from 6.15, real values 145.08 558.30 833.00 159.28 580.04 838.00 160.29 565.40 846.00
      handler.py from 6.15, relative values 1.00 1.00 1.00 1.10 1.04 1.01 1.10 1.01 1.02

      Expected behavior:
      No such big regression, or some explanation what newly implemented feature in pulp justifies the perf.degradation (as usually new features do).

      Business Impact / Additional info:
      The `dnf reposync` use case is not very common. In the most often seen use cases like "install those five packages to 100 clients", there is no performance regression. Just one possible widely used use case can be harmed: Sync of a Capsule with Immediate download policy (quite common among customers). Though I havent tested this at all, just I speculate here.

              Unassigned Unassigned
              rhn-support-pmoravec Pavel Moravec
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: