Description of problem:
Pulp raised the following error when performing an incremental import for Ansible collections.
Errors: {"traceback"=>" File \"/usr/lib/python3.11/site-packages/pulpcore/tasking/tasks.py\", line 61, in _execute_task result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/pulpcore/app/tasks/importer.py\", line 380, in import_repository_version for a_result in _import_file(os.path.join(rv_path, filename), res_class, retry=True): File \"/usr/lib/python3.11/site-packages/pulpcore/app/tasks/importer.py\", line 268, in _import_file a_result = resource.import_data(data, raise_errors=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/import_export/resources.py\", line 813, in import_data result = self.import_data_inner( ^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/import_export/resources.py\", line 882, in import_data_inner raise row_result.errors[-1].error File \"/usr/lib/python3.11/site-packages/import_export/resources.py\", line 748, in import_row self.save_instance(instance, new, using_transactions, dry_run) File \"/usr/lib/python3.11/site-packages/import_export/resources.py\", line 491, in save_instance instance.save() File \"/usr/lib/python3.11/site-packages/pulpcore/app/models/base.py\", line 160, in save return super().save(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib64/python3.11/contextlib.py\", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django_lifecycle/mixins.py\", line 169, in save save(*args, **kwargs) File \"/usr/lib/python3.11/site-packages/django/db/models/base.py\", line 814, in save self.save_base( File \"/usr/lib/python3.11/site-packages/django/db/models/base.py\", line 877, in save_base updated = self._save_table( ^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/models/base.py\", line 1020, in _save_table results = self._do_insert( ^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/models/base.py\", line 1061, in _do_insert return manager._insert( ^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/models/manager.py\", line 87, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/models/query.py\", line 1805, in _insert return query.get_compiler(using=using).execute_sql(returning_fields) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/models/sql/compiler.py\", line 1822, in execute_sql cursor.execute(sql, params) File \"/usr/lib/python3.11/site-packages/django/db/backends/utils.py\", line 67, in execute return self._execute_with_wrappers( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/backends/utils.py\", line 80, in _execute_with_wrappers return executor(sql, params, many, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/django/db/backends/utils.py\", line 84, in _execute with self.db.wrap_database_errors: File \"/usr/lib/python3.11/site-packages/django/db/utils.py\", line 91, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File \"/usr/lib/python3.11/site-packages/django/db/backends/utils.py\", line 89, in _execute return self.cursor.execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/usr/lib/python3.11/site-packages/psycopg/cursor.py\", line 723, in execute raise ex.with_traceback(None)\n", "description"=>"duplicate key value violates unique constraint \"unique_is_highest\"\nDETAIL: Key (collection_id, is_highest)=(019245d8-6ab2-7bfc-9015-fa80c3082eee, t) already exists."}
How reproducible:
Tricky
Is this issue a regression from an earlier version:
No
Steps to Reproduce:
To simulate the issue, we need to hack the Pulp code so that Pulp will generate the rows in certain order.
1. Edit "/usr/lib/python3.11/site-packages/pulp_ansible/app/modelresource.py" and add the "def export" method to class "CollectionVersionContentResource".
Why not changing the "def set_up_queryset" instead? See the additional notes section.
class CollectionVersionContentResource(BaseContentResource): <snip> def export(self, queryset, *args, **kwargs): if queryset: queryset = queryset.order_by("-is_highest") return super().export(queryset, *args, **kwargs)
2. Restart pulpcore services
systemctl restart pulpcore*
3. Create an ansible collection repository and sync the following collection and version.
collections: - name: ansible.posix version: 1.5.4
4. Create a content view, attach the ansible collection repo and then publish version 1.0
5. Export the content view version 1.0
6 . Sync the ansible collection repository again with new versions.
collections:
- name: ansible.posix
version: ">=1.5.4"
7. Publish the content view version 2.0
8. Perform an incremental export for the content view version 2.0
7. Import the content view version 1.0 to another Satellite.
8. Import the content view version 2.0 to another Satellite.
Actual behavior:
raise ex.with_traceback(None)\n", "description"=>"duplicate key value violates unique constraint \"unique_is_highest\"\nDETAIL: Key (collection_id, is_highest)=(019245d8-6ab2-7bfc-9015-fa80c3082eee, t) already exists."}
Expected behavior:
Import successfully.
Additional info:{}
This is the root cause:
After the first complete import, the "ansible_collectionversion" table in the disconnected Satellite should have the following row:
pulpcore=# select name, version, is_highest from ansible_collectionversion; name | version | is_highest -------+---------+------------ posix | 1.5.2 | t (1 row)
During the incremental import, if the incremental json file has the following order then version (1.6.0, t) will be inserted first and causes error. It is because (1.5.4, t) is still not updated in the disconnected Satellite.
{ "namespace": ansible", "name": "posix", "version": "1.6.0", "is_highest": "1", } ... { "namespace": ansible", "name": "posix", "version": "1.5.4", "is_highest": "0", }
While looking for a solution,
I tried to order the rows by "is_highest" like below, but "set_up_queryset" doesn't honour it.
def set_up_queryset(self): """ :return: CollectionVersion content specific to a specified repo-version. """ return CollectionVersion.objects.filter(pk__in=self.repo_version.content).order_by("is_highest")
It is because the queryset result will later be processed in batch while writing it to a json file to save memory. The batch processing code re-fetch the data using their PKs so ordering is lost. See https://github.com/pulp/pulpcore/blob/main/pulpcore/app/importexport.py#L55-L56
The workaround is to wrap the "export()" method of the model resource to perform the re-ordering.