-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
6.16.5
-
False
-
sat-artemis
-
None
-
None
-
None
-
None
-
Yes
Description of problem:
There is an evident performance degradation when a client registered to Satellite 6.16 (or 6.17) runs "dnf reposync" or a similar operation that fetches thousands of packages.
The issue happens when serving locally stored packages (i.e. even with Immediate download policy).
The regression happens between 6.15 and 6.16, while 6.17 performance is similar to 6.16.
Dissecting where the delay comes from, I measured response times of both httpd and of `stream_content` method https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L279 as described below.
When I did a dirty copy of the `handler.py` from 6.15 to 6.16 and 6.17, performance got improved a lot, almost to the 6.15 values. This proves the changes in handler.py cause the regression.
How reproducible:
100% n proper testing
Is this issue a regression from an earlier version:
yes, between 6.15 and 6.16
Steps to Reproduce:
0) use three Satellites, 6.15 (on RHEL8), 6.16 (on RHEL9) and 6.17 (on RHEL9), and three identical clients (RHEL9). Satellites used `large` tuning, otherwise default install.
1) In `/etc/httpd/conf/httpd.conf`, append to `LogFormat` values `%D`, to print response time in microseconds (and restart httpd service)
2) In `/usr/lib/python3.11/site-packages/pulpcore/content/handler.py`, apply patch:
--- /usr/lib/python3.11/site-packages/pulpcore/content/handler.py.orig 2025-09-08 08:48:59.369427549 +0200 +++ /usr/lib/python3.11/site-packages/pulpcore/content/handler.py.617debugs 2025-09-09 08:31:50.713746381 +0200 @@ -1,3 +1,4 @@ +import time import asyncio import logging from multidict import CIMultiDict @@ -286,8 +287,12 @@ class Handler: [aiohttp.web.StreamResponse][] or [aiohttp.web.FileResponse][]: The response back to the client. """ + start_time = time.time() path = request.match_info["path"] - return await self._match_and_stream(path, request) + ret = await self._match_and_stream(path, request) + end_time = time.time() + log.info(f"PavelM: stream_content of {path=} in time {end_time-start_time}") + return ret @staticmethod def _base_paths(path):
to measure response time of pulp (and restart pulpcore-content service).
3) To ensure `redis` does not cache anything, stop the service (it should not cache much packages since cnd reposync fetches whole repo content which is far beyond of redis cache, but dont have our tests biased by redis).
4) Run either `dnf reposync`, or the same in sequential mode (`echo max_parallel_downloads=1 >> /etc/dnf/dnf.conf`), or iterate `curl` to fetch each and every package of a repo (generate the list of packages from dnf metadata and get SSL certs from redhat.repo). In either case (dnf / curl), results were similar.
5) Check these three values:
- sum of pulp response times per changes in handler.py
- sum of httpd response times per httpd access logs
- runtime of the client (dnf or bash script running curl commands)
Actual behavior:
TL;DR: 6.16 is worse by 20ish percents than 6.15, while 6.17 is yet bit slower by a few percents - depending what metric you focus on. Copying handler.py from pulp in 6.15 to 6.16 and 6.17, most of the difference is gone.
I run each scenario 5 times and put just median value each time. In each scenario, I calculated sum of resposnse times in seconds. RHEL9 BaseOS repo was used here, while similar outcome was seen for RHEL8 BaseOS repo (and almost surely exist for any other repo).
Scenario | 6.15 pulp | 6.15 httpd | 6.15 client | 6.16 pulp | 6.16 httpd | 6.16 client | 6.17 pulp | 6.17 httpd | 6.17 client |
---|---|---|---|---|---|---|---|---|---|
default scenario, real values | 146.71 | 562.90 | 837.00 | 212.62 | 641.56 | 940.00 | 222.87 | 673.07 | 962.00 |
default scenario, values relative to 6.15 | 1.00 | 1.00 | 1.00 | 1.45 | 1.14 | 1.12 | 1.52 | 1.20 | 1.15 |
handler.py from 6.15, real values | 145.08 | 558.30 | 833.00 | 159.28 | 580.04 | 838.00 | 160.29 | 565.40 | 846.00 |
handler.py from 6.15, relative values | 1.00 | 1.00 | 1.00 | 1.10 | 1.04 | 1.01 | 1.10 | 1.01 | 1.02 |
Expected behavior:
No such big regression, or some explanation what newly implemented feature in pulp justifies the perf.degradation (as usually new features do).
Business Impact / Additional info:
The `dnf reposync` use case is not very common. In the most often seen use cases like "install those five packages to 100 clients", there is no performance regression. Just one possible widely used use case can be harmed: Sync of a Capsule with Immediate download policy (quite common among customers). Though I havent tested this at all, just I speculate here.