Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 6.15.0
Affects Version/s: None
Component/s: Pulp
Labels:
- triaged

Story Points:
0
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Flags:
BZ Status:
CLOSED
Bugzilla Bug:
RHBZ: 2251200
PM Score:
850
BZ Keywords:
- Triaged
Intelligence Requested:
Market:

Severity:
Important

Target Version:

6.15.0

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

PX Priority Data:
PX Impact Score:

Description of problem:
When one synchronizes whole https://galaxy.ansible.com/api/ to Satellite, then Satellite (katello and pulp) hits some scalability+performance issues:

1) Actions::Katello::Repository::IndexContent dynflow step does often fail during repo sync. Either with 502 response error code or with Faraday::ConnectionFailed EOFError (EOFError) error. Partially it is consequence of 2) , but also due to a lack of pagination, I guess. Sincethe dynflow step raises query:

Nov 23 09:22:22 pmoravec-sat614 pulpcore-api[2078]: pulp [6643347abff54c979e14631fe16d71ed]: - - [23/Nov/2023:08:22:22 +0000] "GET /pulp/api/v3/content/ansible/collection_versions/?limit=2000&offset=0&repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Fansible%2Fansible%2F2b92bb80-5770-4d3a-a02d-61cb30eeb5e2%2Fversions%2F1%2F HTTP/1.1" 200 1246516303 "-" "OpenAPI-Generator/0.16.1/ruby"

that lasted for 8 minutes until pulpcore-api got signal 9. See the response length, over 1G data. Worth paginating it..?

Also, sidekiq process consumed 8.5GB of memory during that, too much.

2) underlying pulp queries are VERY inefficient - I guess they lack some filters so many redundant data are processed..? E.g.:

2a) the particular query from katello. It always failed for me /o\ after it consumed 7GB of memory.

2b) ansible-galaxy raises queries like:

/api/v3/collections/community/general/versions/?limit=100

and then ..../versions/8.0.2/ for each individual version. The "get me list of versions" is the very slow here, in particular:

/api/v3/collections/community/general/versions/?limit=100 :
run 22s, pulp gunicorn process consumed 3091340kB RSS
/api/v3/collections/community/general/versions/?limit=1 :
22s, 3092328 RSS
/api/v3/collections/community/general/versions/?limit=1&ordering=is_highest :
22s, 3186600 RSS

Curiously, querying:

/api/v3/collections/community/general/versions/?limit=1&is_highest=true :
0.3s, no memory increase spotted!

Checking where pulp spends the most of time, it is inside queryset = sorted(..).

/usr/lib/python3.9/site-packages/pulp_ansible/app/galaxy/v3/views.py :

def list(self, request, *args, **kwargs):
"""
Returns paginated CollectionVersions list.
"""
queryset = self.filter_queryset(self.get_queryset())
queryset = sorted(
queryset, key=lambda obj: semantic_version.Version(obj.version), reverse=True
)

The "queryset = self.filter_queryset(self.get_queryset())" takes 1-2 seconds, the queryset = sorted(..) takes 20 seconds. Even for "limit=1" or "limit=1&ordering=is_highest". While "limit=1&is_highest=true" query following the same code is pretty fast.

I understand there is usually no need to sync all collections and we should use Requirements to filter collections of interest, but then state in documentation (https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13/html-single/managing_configurations_using_ansible_integration_in_red_hat_satellite/index#synchronizing-ansible-collections_ansible) we support only filtered content.

But I would rather see some improvement - why getting 139 entries takes 20 seconds and consume 3GB memory?

Version-Release number of selected component (if applicable):
Sat 6.13 (also in 6.14)

How reproducible:
100%

Steps to Reproduce:
1. Sync repo of type ansible, with upstream URL https://galaxy.ansible.com/api/ and monitor sidekiq + pulp's gunicorn memory usage.
2. Try to list versions of some collection, like:

time curl -L -k 'https://localhost/pulp_ansible/galaxy/ORGANIZATION/Library/custom/PRODUCT/REPOSITORY/api/v3/collections/community/general/versions/?limit=100&offset=0' > galaxy_whole.versions.100.json

and use other URIs from above.

3. monitor memory usage of pulp's gunicorn processes

Actual results:
1. sidekiq consumes 8.5GB memory, pulp consumes 7GB memory, sync often/always fails on indexing katello content.
2. times and memory usage for given URIs:

/api/v3/collections/community/general/versions/?limit=100 :
22s, 3091340kB RSS
/api/v3/collections/community/general/versions/?limit=1 :
22s, 3092328 RSS
/api/v3/collections/community/general/versions/?limit=1&ordering=is_highest :
22s, 3186600 RSS
/api/v3/collections/community/general/versions/?limit=1&is_highest=true :
0.3s, no memory increase spotted!
(just this latest is great)

Expected results:
Several times faster requests, less memory usage.

Additional info:
Does "limit" parameter work well for the query? Since sub-URIs:

/versions/?limit=100
/versions/?limit=200
/versions/?limit=200&offset=0

all return:

{
"meta":

{ "count": 139 }

,
"links":

{ "first": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=0", "previous": null, "next": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=100", "last": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=39" }

,

followed by 100 items of data. Not 200 per limit=200.

external trackers

Github pulp/pulp_ansible/issues/1410

Red Hat Errata Tool 129857

Red Hat Product Errata RHSA-2024:2010