Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-21473

getting versions of an ansible collections does not scale

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 6.15.0
    • None
    • Pulp
    • Important
    • No

      Description of problem:
      When one synchronizes whole https://galaxy.ansible.com/api/ to Satellite, then Satellite (katello and pulp) hits some scalability+performance issues:

      1) Actions::Katello::Repository::IndexContent dynflow step does often fail during repo sync. Either with 502 response error code or with Faraday::ConnectionFailed EOFError (EOFError) error. Partially it is consequence of 2) , but also due to a lack of pagination, I guess. Sincethe dynflow step raises query:

      Nov 23 09:22:22 pmoravec-sat614 pulpcore-api[2078]: pulp [6643347abff54c979e14631fe16d71ed]: - - [23/Nov/2023:08:22:22 +0000] "GET /pulp/api/v3/content/ansible/collection_versions/?limit=2000&offset=0&repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Fansible%2Fansible%2F2b92bb80-5770-4d3a-a02d-61cb30eeb5e2%2Fversions%2F1%2F HTTP/1.1" 200 1246516303 "-" "OpenAPI-Generator/0.16.1/ruby"

      that lasted for 8 minutes until pulpcore-api got signal 9. See the response length, over 1G data. Worth paginating it..?

      Also, sidekiq process consumed 8.5GB of memory during that, too much.

      2) underlying pulp queries are VERY inefficient - I guess they lack some filters so many redundant data are processed..? E.g.:

      2a) the particular query from katello. It always failed for me /o\ after it consumed 7GB of memory.

      2b) ansible-galaxy raises queries like:

      /api/v3/collections/community/general/versions/?limit=100

      and then ..../versions/8.0.2/ for each individual version. The "get me list of versions" is the very slow here, in particular:

      • /api/v3/collections/community/general/versions/?limit=100 :
      • run 22s, pulp gunicorn process consumed 3091340kB RSS
      • /api/v3/collections/community/general/versions/?limit=1 :
      • 22s, 3092328 RSS
      • /api/v3/collections/community/general/versions/?limit=1&ordering=is_highest :
      • 22s, 3186600 RSS

      Curiously, querying:

      • /api/v3/collections/community/general/versions/?limit=1&is_highest=true :
      • 0.3s, no memory increase spotted!

      Checking where pulp spends the most of time, it is inside queryset = sorted(..).

      /usr/lib/python3.9/site-packages/pulp_ansible/app/galaxy/v3/views.py :

      def list(self, request, *args, **kwargs):
      """
      Returns paginated CollectionVersions list.
      """
      queryset = self.filter_queryset(self.get_queryset())
      queryset = sorted(
      queryset, key=lambda obj: semantic_version.Version(obj.version), reverse=True
      )

      The "queryset = self.filter_queryset(self.get_queryset())" takes 1-2 seconds, the queryset = sorted(..) takes 20 seconds. Even for "limit=1" or "limit=1&ordering=is_highest". While "limit=1&is_highest=true" query following the same code is pretty fast.

      I understand there is usually no need to sync all collections and we should use Requirements to filter collections of interest, but then state in documentation (https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13/html-single/managing_configurations_using_ansible_integration_in_red_hat_satellite/index#synchronizing-ansible-collections_ansible) we support only filtered content.

      But I would rather see some improvement - why getting 139 entries takes 20 seconds and consume 3GB memory?

      Version-Release number of selected component (if applicable):
      Sat 6.13 (also in 6.14)

      How reproducible:
      100%

      Steps to Reproduce:
      1. Sync repo of type ansible, with upstream URL https://galaxy.ansible.com/api/ and monitor sidekiq + pulp's gunicorn memory usage.
      2. Try to list versions of some collection, like:

      time curl -L -k 'https://localhost/pulp_ansible/galaxy/ORGANIZATION/Library/custom/PRODUCT/REPOSITORY/api/v3/collections/community/general/versions/?limit=100&offset=0' > galaxy_whole.versions.100.json

      and use other URIs from above.

      3. monitor memory usage of pulp's gunicorn processes

      Actual results:
      1. sidekiq consumes 8.5GB memory, pulp consumes 7GB memory, sync often/always fails on indexing katello content.
      2. times and memory usage for given URIs:

      • /api/v3/collections/community/general/versions/?limit=100 :
      • 22s, 3091340kB RSS
      • /api/v3/collections/community/general/versions/?limit=1 :
      • 22s, 3092328 RSS
      • /api/v3/collections/community/general/versions/?limit=1&ordering=is_highest :
      • 22s, 3186600 RSS
      • /api/v3/collections/community/general/versions/?limit=1&is_highest=true :
      • 0.3s, no memory increase spotted!
        (just this latest is great)

      Expected results:
      Several times faster requests, less memory usage.

      Additional info:
      Does "limit" parameter work well for the query? Since sub-URIs:

      /versions/?limit=100
      /versions/?limit=200
      /versions/?limit=200&offset=0

      all return:

      {
      "meta":

      { "count": 139 }

      ,
      "links":

      { "first": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=0", "previous": null, "next": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=100", "last": "/pulp_ansible/galaxy/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/api/v3/plugin/ansible/content/RedHat/Library/custom/Ansible_Galaxy/galaxy_whole/collections/index/community/general/versions/?limit=100&offset=39" }

      ,

      followed by 100 items of data. Not 200 per limit=200.

          There are no Sub-Tasks for this issue.

              jira-bugzilla-migration RH Bugzilla Integration
              jira-bugzilla-migration RH Bugzilla Integration
              Gaurav Talreja Gaurav Talreja
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: