Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-23579 Failed incremental CV import shows error: duplicate key value violates unique constraint "rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq"
  3. SAT-23834

[QE] Failed incremental CV import shows error: duplicate key value violates unique constraint "rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq"

XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Pulp
    • 0
    • False
    • Hide

      None

      Show
      None
    • False
    • 0
    • Rocket

      +++ This bug was initially created as a clone of Bug #2253381 +++

      Description of problem:

      We are running a CV version import ( incremental ) with:

      ```
      hammer content-import version --organization-id=1 --path=/var/lib/pulp/imports/2023-1-28T03-49-03-00-00
      ```

      ...but has produced the following errors repeated for several repositories at about 61% of task completion.

      Example:

      ```
      duplicate key value violates unique constraint "rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq"
      ```

      This error repeats several times for various RHEL 7,8 and 9 repositories.

      Please note that we can NOT provide a sosreport, logs or tracebacks from this customer, and Satellite is running in an air-gapped disconnected environment.

      Our support case is https://access.redhat.com/support/cases/#/case/03679066/

      There have been other support cases from different customers where this exact database key is reporting an error ( rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq ), and in all of them Red Hat support has recommended that they switch to using the Syncable Exports feature instead of actually resolving the issue.

      We are on version 6.11 of Satellite, but it's notable that other support cases indicate that the issue is present on Satellite 6.13 too.

      Other support cases that contain the rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq error:
      https://access.redhat.com/support/cases/#/case/03671664 ( Sat 6.12 )
      https://access.redhat.com/support/cases/#/case/03670822 ( Sat 6.13 )

      Version-Release number of selected component (if applicable):
      6.11
      6.12
      6.13

      How reproducible:
      Create large CV to export/import
      After that succeeds, create incremental export and try to import that version of CV

      Actual results:
      duplicate key value violates unique constraint "rpm_updatecollectionname_name_update_record_id_6ef33bed_uniq"

      Expected results:
      Incremental Content View imports successfully

      Additional info:

      — Additional comment from on 2023-12-07T01:45:23Z

      Adding this note from internal Slack discussion so it can be referenced later if need be:

      The errors in my traceback say they are for:
      `pulp_rpm/copy.py` line 223
      `pulp_rpm/repository.py` lines 1017, 312
      `pulp_rpm/advisory.py` lines 136, 314, 218, 206, 329

      Then some others in:
      `django/django_lifecycle/mixins.py` line 134
      `django/db/models/base.py` lines 739, 776, 881, 919
      `django/db/models/manager.py` line 85
      `django/db/models/query.py` line 1270
      `django/db/models/sql/compiler.py` line 1416
      `django/db/backends/utils.py` lines 66, 84, 75, 90

      — Additional comment from on 2023-12-07T04:41:08Z

      So the traceback in comment #1 is very similar to what I got from another case (see below):

      ~~~
      {"traceback"=>" File \"/usr/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py\", line 458, in _perform_task
      result = func(*args, **kwargs)
      File \"/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py\", line 275, in import_repository_version
      new_version.set_content(content)
      File \"/usr/lib/python3.9/site-packages/pulpcore/app/models/repository.py\", line 1065, in _exit_
      repository.finalize_new_version(self)
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/models/repository.py\", line 316, in finalize_new_version
      resolve_advisories(new_version, previous_version)
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/advisory.py\", line 141, in resolve_advisories
      to_add, to_remove, to_exclude = resolve_advisory_conflict(
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/advisory.py\", line 319, in resolve_advisory_conflict
      _do_merge()
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/advisory.py\", line 223, in _do_merge
      merged_advisory = merge_advisories(previous_advisory, added_advisory)
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/advisory.py\", line 411, in merge_advisories
      _copy_update_collections_for(
      File \"/usr/lib/python3.9/site-packages/pulp_rpm/app/advisory.py\", line 334, in _copy_update_collections_for
      collection.save()
      File \"/usr/lib64/python3.9/contextlib.py\", line 79, in inner
      return func(*args, **kwds)
      File \"/usr/lib/python3.9/site-packages/django_lifecycle/mixins.py\", line 169, in save
      save(*args, **kwargs)
      File \"/usr/lib/python3.9/site-packages/django/db/models/base.py\", line 739, in save
      self.save_base(using=using, force_insert=force_insert,
      File \"/usr/lib/python3.9/site-packages/django/db/models/base.py\", line 776, in save_base
      updated = self._save_table(
      File \"/usr/lib/python3.9/site-packages/django/db/models/base.py\", line 881, in _save_table
      results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
      File \"/usr/lib/python3.9/site-packages/django/db/models/base.py\", line 919, in _do_insert
      return manager._insert(
      File \"/usr/lib/python3.9/site-packages/django/db/models/manager.py\", line 85, in manager_method
      return getattr(self.get_queryset(), name)(*args, **kwargs)
      File \"/usr/lib/python3.9/site-packages/django/db/models/query.py\", line 1270, in _insert
      return query.get_compiler(using=using).execute_sql(returning_fields)
      File \"/usr/lib/python3.9/site-packages/django/db/models/sql/compiler.py\", line 1416, in execute_sql
      cursor.execute(sql, params)
      File \"/usr/lib/python3.9/site-packages/django/db/backends/utils.py\", line 66, in execute
      return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
      File \"/usr/lib/python3.9/site-packages/django/db/backends/utils.py\", line 75, in _execute_with_wrappers
      return executor(sql, params, many, context)
      File \"/usr/lib/python3.9/site-packages/django/db/backends/utils.py\", line 84, in _execute
      return self.cursor.execute(sql, params)
      File \"/usr/lib/python3.9/site-packages/django/db/utils.py\", line 90, in _exit_
      raise dj_exc_value.with_traceback(traceback) from exc_value
      File \"/usr/lib/python3.9/site-packages/django/db/backends/utils.py\", line 84, in _execute
      return self.cursor.execute(sql, params)
      ", "description"=>"duplicate key value violates unique constraint \"rpm_updatecollection_name_update_record_id_6ef33bed_uniq\"
      DETAIL: Key (name, update_record_id)=(rhel-8-for-x86_64-baseos-rpms__8_0_default_0, 173aa5f7-27b0-4835-b518-6d4035df7cc5) already exists
      ~~~

      The import tried to resolve the advisory conflict and failed to merge the advisory.

      I wonder if removing the "collection.pk = None" and let django to decide whether to create/update the collection row.
      ~~~
      def _copy_update_collections_for(advisory, collections):
      """
      Deep-copy each UpdateCollection in the_collections, and assign to its new advisory.
      """
      new_collections = []
      with transaction.atomic():
      for collection in collections:
      uc_packages = list(collection.packages.all())
      collection.pk = None <=============================
      collection.update_record = advisory
      collection.save()
      new_packages = []
      for a_package in uc_packages:
      a_package.pk = None
      a_package.update_collection = collection
      new_packages.append(a_package)
      UpdateCollectionPackage.objects.bulk_create(new_packages)
      new_collections.append(collection)
      return new_collections
      ~~~

      I still don't understand why it needs to do merging. Why there is no package list intersection for Red Hat advisory?

      1. app/advisory.py
        ~~~
        elif same_dates and same_version and not pkgs_intersection: <=================
        _do_merge()
        ~~~

      — Additional comment from on 2023-12-07T16:38:18Z

      Questions:

      • Does this happen every time, to the same repository(ies)? Or is it "occasionally"?
      • Is it possible to ask the customer to retry the import with only *one* pulpcore worker active?
      • if they can, and do, and the import works - that will tell us a lot about what the problem might be
      • (NB: artifact-import happens in one worker, and is very time-intensive. The portion of import that
        takes advantage of multiple workers, is a smaller portion of the total time)

      — Additional comment from on 2023-12-07T22:25:05Z

      Hi team,

      Thanks for your help with this customer.

      I am enabling Escalation flag due to the business impact below:

      Business Impact:

      • Australian Signals Directorate (ASD) is a strategic governmental customer with ~$10m AUD booking in Australia
      • This customer is a security-focused organisation, so timely application of patches and security errata for all repositories is essential to them. We need to incrementally export new packages/errata weekly for all repositories in use.
      • Consultant engaged

      Any input to move this forward is highly appreciated.

      Kind regards,
      Jennie

      — Additional comment from on 2023-12-11T15:18:48Z

      I'm having no luck so far with reproducing this locally, which means I'm having to make "educated guesses" about what's going on and how we might fix it. I'm going to dump some brainstorming here, maybe someone will have an insight!

      @hyu@redhat.com I'm also puzzled about exactly what the data-shape is that's causing us to get into _do_merge(). advisory.get_pkglist() returns a list-of NEVRA-tuples, which we turn into a set in order to do set-operations. pkgs_intersection being None implies no NEVRA overlap between two versions of an advisory with the same "id" (e.g., RHBA-2023:12345 is an ID). This can happen when, for example, one pulls debuginfo repos into "regular" repos - same advisory, regular-repo-pkglist is the non-debuginfo rpms, dbginfo-pkglist is only debuginfo rpms, hence no overlap. But I am puzzled at how/why we're hitting this problem *all the time*, for this customer.

      Now, given that we need to merge advisories (for whatever reason), if an advisory is common between multiple repositories (which can absolutely be a thing, asp given how Satellie CVs work), then I can see how the "same" advisory (where "same" is content has the same pulp_id"), is being handled concurrently in multiple threads, since import spawns one thread per repo-being-imported.

      (speaking of which - @bwood any word on trying with just one pulpcore-worker available?)

      One thought I have, for a workaround, is to do a try/except around collection.save(). If we hit the UQ here, it means "a collectionwith this name *already exists* for this advisory". We could "assume" that means what we collided with, is what we wanted in the first place, and just continue to the next collection.

      OR, and perhaps better, we could at the exception-point rename the collection - say, by adding a GUID to the name - to force uniqueness. This would be harmless to the use of the advisory (altho it would look a little odd) and get us past the exception.

      I don't particularly like either of these, because they're workarounds for a problem that...shouldn't be happening at all. Without a reproducer, and/or a lot more data from the customer than they're willing/legally allowed to give, "potential workarounds" may be the best option to make the customer whole while we're trying to understand a real root-cause.

      — Additional comment from on 2023-12-11T19:20:49Z

      As an *example* (please - this is a thought-experiment, not a hot-fix), consider the following change I made in pulp_rpm/3.17:
      =====
      diff --git a/pulp_rpm/app/advisory.py b/pulp_rpm/app/advisory.py
      index 59647755..a011e917 100644
      — a/pulp_rpm/app/advisory.py
      +++ b/pulp_rpm/app/advisory.py
      @@ -3,6 +3,7 @@ from collections import defaultdict
      from itertools import chain

      import hashlib
      +import uuid

      import createrepo_c as cr
      from datetime import datetime
      @@ -326,7 +327,12 @@ def _copy_update_collections_for(advisory, collections):
      uc_packages = list(collection.packages.all())
      collection.pk = None
      collection.update_record = advisory

      • collection.save()
        + try:
        + collection.save()
        + except IntegrityError:
        + # Retry with a guaranteed-unique name for the collection this time
        + collection.name = f" {collection.name}

        _

        {uuid.uuid4}

        "
        + collection.save()
        new_packages = []
        for a_package in uc_packages:
        a_package.pk = None
        =====

      *might* get the customer past the current failure in a way that results in some oddly-named package lists in their advisories, but will otherwise work fine. Thoughts, anyone?

      Meanwhile - I will continue trying to force the error so we can figure out how we're getting here in the first place.

      — Additional comment from on 2023-12-12T03:40:31Z

      Hmm... If it is caused by concurrency. How possible that 2 task got the same update_record pk that is just created/saved?

      ~~~
      try:
      with transaction.atomic():
      merged_advisory.save() <============== this should generate a unique update_record pk
      except IntegrityError:
      merged_advisory = UpdateRecord.objects.get(digest=merged_digest) <================ I don't expect this to run "else" so this shouldn't be the issue. If this will run the "else" then I can see how the issue could happened
      merged_advisory.touch()
      else:

      1. For UpdateCollections, make sure we don't re-use the collections for either of the
      2. advisories being merged
        _copy_update_collections_for(
        merged_advisory, chain(previous_collections, added_collections)
        )
        for reference in references:
      3. copy reference and add relation for advisory
        reference.pk = None
        reference.save()
        merged_advisory.references.add(reference)
        ~~~

      — Additional comment from on 2023-12-12T06:46:10Z

      It is possible to reproduce the duplicate issue. The duplicate error may happen if the errata has already been merged before (either during previous repo sync or previous import) and then the current import task try to merge it again. See below:

      NOTE: I still haven't figured out why there is no pkglist intersection between the old and new errata in the same repo. To help me reproducing the merging bug, I just manually update the collection name of the errata so that they have the same collection name but different pkglist.

      This is how I reproduce it:
      ~~~

      1. RHSA-2023:5997
        >>> old = UpdateRecord.objects.get(pk="2e8980de-d250-49c2-957d-3cce940dfc31")
        >>> new = UpdateRecord.objects.get(pk="44645e12-74b3-4954-a8ec-47d41cad209a")
        >>> old
        <UpdateRecord: pk=2e8980de-d250-49c2-957d-3cce940dfc31>
        >>> new
        <UpdateRecord: pk=44645e12-74b3-4954-a8ec-47d41cad209a>
        >>>
        >>> old_coll = old.collections.all()
        >>> new_coll = new.collections.all()
        >>>
        >>>
        >>> set(old.get_pkglist()) {('python3-test', '0', '3.6.8', '51.el8_8.2', 'x86_64'), ('python3-libs', '0', '3.6.8', '51.el8_8.2', 'x86_64'), ('python3-libs', '0', '3.6.8', '51.el8_8.2', 'i686'), ('platform-python', '0', '3.6.8', '51.el8_8.2', 'x86_64')}

        >>> old_pkglist = set(old.get_pkglist())
        >>> new_pkglist = set(new.get_pkglist())
        >>> new_pkglist

        {('python3-debuginfo', '0', '3.6.8', '51.el8_8.2', 'i686'), ('python3-debugsource', '0', '3.6.8', '51.el8_8.2', 'x86_64'), ('python3-debugsource', '0', '3.6.8', '51.el8_8.2', 'i686'), ('python3-debuginfo', '0', '3.6.8', '51.el8_8.2', 'x86_64')}

        >>> old_pkglist.intersection(new_pkglist)
        set()

      >>> import createrepo_c as cr
      >>> import hashlib
      >>> from itertools import chain
      >>> from pulp_rpm.app.advisory import _copy_update_collections_for
      >>>
      >>> def hash_update_record(update):
      ... uinfo = cr.UpdateInfo()
      ... uinfo.append(update)
      ... return hashlib.sha256(uinfo.xml_dump().encode("utf-8")).hexdigest()
      ...
      >>> names_seen =

      {"collection": 0}

      >>> for collection in chain(old_coll, new_coll):
      ... # no-name? When merging, ILLEGAL! Give it a name
      ... if not collection.name:
      ... collection.name = "collection"
      ... if collection.name in names_seen.keys():
      ... orig_name = collection.name
      ... new_name = f"

      {orig_name}_{names_seen[orig_name]}"
      ... names_seen[orig_name] += 1
      ... collection.name = new_name
      ... # if we've not seen it before, store in names-seen as name:0
      ... else:
      ... names_seen[collection.name] = 0
      ...
      >>> merged_advisory_cr = old.to_createrepo_c(collections=chain(old_coll, new_coll))
      >>> merged_digest = hash_update_record(merged_advisory_cr)
      >>> merged_advisory = old
      >>> merged_advisory.pk = None
      >>> merged_advisory.pulp_id = None
      >>> merged_advisory.digest = merged_digest
      >>> merged_advisory.save()
      >>> _copy_update_collections_for(merged_advisory, chain(old_coll, new_coll))
      [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>] <========== First merging result.

      ### Attempting the second merge ###

      >>>
      >>>
      >>> old = merged_advisory
      >>> new = UpdateRecord.objects.get(pk="2e8980de-d250-49c2-957d-3cce940dfc31")
      >>>
      >>> old_coll = old.collections.all()
      >>> new_coll = new.collections.all()
      >>>
      >>> names_seen = {"collection": 0}
      >>> for collection in chain(old_coll, new_coll):
      ... # no-name? When merging, ILLEGAL! Give it a name
      ... if not collection.name:
      ... collection.name = "collection"
      ... if collection.name in names_seen.keys():
      ... orig_name = collection.name
      ... new_name = f"{orig_name}

      _

      {names_seen[orig_name]}"
      ... names_seen[orig_name] += 1
      ... collection.name = new_name
      ... # if we've not seen it before, store in names-seen as name:0
      ... else:
      ... names_seen[collection.name] = 0
      ...
      >>> merged_advisory_cr = old.to_createrepo_c(collections=chain(old_coll, new_coll))
      >>> merged_digest = hash_update_record(merged_advisory_cr)
      >>> merged_advisory = old
      >>> merged_advisory.pk = None
      >>> merged_advisory.pulp_id = None
      >>> merged_advisory.digest = merged_digest
      >>> merged_advisory.save()
      >>> _copy_update_collections_for(merged_advisory, chain(old_coll, new_coll))
      Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
      return self.cursor.execute(sql, params)
      psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "rpm_updatecollection_name_update_record_id_6ef33bed_uniq"
      DETAIL: Key (name, update_record_id)=(rhel-8-for-x86_64-baseos-rpms__8_0_default_0, 46922af9-f588-4891-9853-6abe23aad493) already exists. <=========================== Reproduced.


      As we can see it is trying to merge the 2 "_default_0" collection name:

      >>> old_coll
      <QuerySet [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>]>
      >>> new_coll
      <QuerySet [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>]>
      ~~~

      — Additional comment from on 2023-12-15T05:40:00Z

      Requested @keiwilli to run the following queries from the customer's disconnected Satelltie.

      Query Set 1:
      ---------------
      Query:
      su - postgres -c "psql pulpcore -c \"select content_ptr_id,id from rpm_updaterecord where content_ptr_id in (select update_record_id from rpm_updatecollection where name = 'rhel-7-server-extras-rpms_x8664_0_default_0' order by update_record_id asc limit 1);\""

      Returned:
      content_ptr_id: 08702cd4-23c4-470a-9a7e-64d3e3d06934
      id: RHBA-2018:0434


      Query:
      su - postgres -c "psql pulpcore -c \"select name from rpm_updatecollection where update_record_id = '08702cd4-23c4-470a-9a7e-64d3e3d06934';\""

      Returned:
      collection-0
      rhel-7-server-extras-rpms__x86_64_0_default
      rhel-7-server-extras-rpms__x86_64_0_default_0
      ~~~

      Query:
      su - postgres -c "psql pulpcore -c \"select uc.name, ucp.name, ucp.version, ucp.release, ucp.arch, ucp.epoch from rpm_updatecollectionpackage ucp left join rpm_updatecollection uc on ucp.update_collection_id = uc.pulp_id where uc.update_record_id in (select update_record_id from rpm_updatecollection where name = 'rhel-7-server-extras-rpms_x8664_0_default_0' order by update_record_id asc limit 1) order by uc.name;\""

      Returned:
      name: collection-0
      name: oci_umount
      version: 2.3.3
      release: 3.gite3c9055.el7
      arch: x86_64
      epoch: 2
      ---------------


      Query Set 2:
      ---------------
      Query:
      su - postgres -c "psql pulpcore -c \"select content_ptr_id,id from rpm_updaterecord where content_ptr_id in (select update_record_id from rpm_updatecollection where name = 'rhel-7-server-eus-optional-rpms__7_DOT_5__x86_64_0_default_0' order by update_record_id asc limit 1);\""

      Returned:
      content_ptr_id: 977cf13b-c2b9-4b25-8d54-68341020ca64
      id: RHBA-2017:0102


      Query:
      su - postgres -c "psql pulpcore -c \"select name from rpm_updatecollection where update_record_id = '977cf13b-c2b9-4b25-8d54-68341020ca64';\""

      Returned:
      collection-0
      rhel-7-server-eus-optional-rpms__7_DOT_5__x86_64_0_default
      rhel-7-server-eus-optional-rpms__7_DOT_5__x86_64_0_default_0


      Query:
      su - postgres -c "psql pulpcore -c \"select uc.name, ucp.name, ucp.version, ucp.release, ucp.arch, ucp.epoch from rpm_updatecollectionpackage ucp left join rpm_updatecollection uc on ucp.update_collection_id = uc.pulp_id where uc.update_record_id in (select update_record_id from rpm_updatecollection where name = 'rhel-7-server-eus-optional-rpms__7_DOT_5__x86_64_0_default_0' order by update_record_id asc limit 1) order by uc.name;\""

      Returned:
      The second command returned:
      name: collection-0
      name: tuned-utils-systemtap
      version: 2.7.1
      release: 3.el7_3.1
      arch: noarch
      epoch: 0

      name: collection-0
      name: tuned-gtk
      version: 2.7.1
      release: 3.el7_3.1
      arch: noarch
      epoch: 0

      name: collection-0
      name: tuned-profiles-compat
      version: 2.7.1
      release: 3.el7_3.1
      arch: noarch
      epoch: 0

      name: collection-0
      name: tuned-profiles-atomic
      version: 2.7.1
      release: 3.el7_3.1
      arch: noarch
      epoch: 0

      name: collection-0
      name: tuned-profiles-oracle
      version: 2.7.1
      release: 3.el7_3.1
      arch: noarch
      epoch: 0
      ---------------

      As we can see above both update records have 3 collection names. This proved reproducing scenario in comment #8 is correct. Both errata have been merged once already. Attempt to merge again cause failure.

      Another weird issue I found from the above queries is, only "collection-0" returned some pkglist. No pkglist returned for "<repo>_default" and "<repo>_default_0" collections.

      — Additional comment from on 2024-01-16T06:00:13Z

      I think I have a clue about why there is a empty pkglist in the collection and causing no pkglist intersection and then trigger the advisory merge.

      I can reproduce a collection with missing packages but still not able to reproduce the duplicate collection error.

      If a rpm exist in multiple errata, then only one rpm among the "update collection packages" will be picked to export.

      For example: oci-umount rpm is associated in multiple errata as shown in the SOURCE Satellite below:
      ~~~
      # select id, digest,ucp.filename, uc.name from rpm_updaterecord r left join rpm_updatecollection uc on uc.update_record_id = r.content_ptr_id left join rpm_updatecollectionpackage ucp on uc.pulp_id = ucp.update_collection_id where ucp.name = 'oci-umount' and ucp.version = '2.3.3' and uc.name = 'rhel-7-for-power-le-extras-rpms__ppc64le_0_default' order by r.id;
      id | digest | filename | name
      ----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------
      RHBA-2018:0434 | 60dd7201b18e3a27ad706b04a97d9c8edbd88d5301554dd129b6e87af995b99b | oci-umount-2.3.3-3.gite3c9055.el7.ppc64le.rpm | rhel-7-for-power-le-extras-rpms__ppc64le_0_default
      RHEA-2018:1063 | d1fa6f891c2aefb3eb4e443a9254aa96e72a02581f8218b99088766680cf10c6 | oci-umount-2.3.3-3.gite3c9055.el7.ppc64le.rpm | rhel-7-for-power-le-extras-rpms__ppc64le_0_default
      RHEA-2018:3365 | 609a77eec590a3fc55f1d44e36ab25e20d7231c38536d839776dbaba82e9b833 | oci-umount-2.3.3-3.gite3c9055.el7.ppc64le.rpm | rhel-7-for-power-le-extras-rpms__ppc64le_0_default
      ~~~

      But only one update collection package is imported to the DESTINATION Satellite as shown below:
      ~~~
      # select id, digest,ucp.filename, uc.name from rpm_updaterecord r left join rpm_updatecollection uc on uc.update_record_id = r.content_ptr_id left join rpm_updatecollectionpackage ucp on uc.pulp_id = ucp.update_collection_id where ucp.name = 'oci-umount' and ucp.version = '2.3.3' and uc.name = 'rhel-7-for-power-le-extras-rpms__ppc64le_0_default' order by r.id;
      id | digest | filename | name
      ----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------
      RHBA-2018:0434 | 60dd7201b18e3a27ad706b04a97d9c8edbd88d5301554dd129b6e87af995b99b | oci-umount-2.3.3-3.gite3c9055.el7.ppc64le.rpm | rhel-7-for-power-le-extras-rpms__ppc64le_0_default
      ~~~


      This can be avoided after removing the following line in the SOURCE Satellite and then do a new incremental export import
      — a/app/modelresource.py 2023-10-14 05:05:24.000000000 +1000
      +++ b/app/modelresource.py 2024-01-16 15:42:43.414010788 +1000
      @@ -528,7 +528,6 @@
      update_record_in=UpdateRecord.objects.filter(pk_in=self.repo_version.content)
      )
      )
      - .distinct("name", "epoch", "version", "release", "arch")
      .order_by("name", "epoch", "version", "release", "arch")
      .select_related("update_collection", "update_collection__update_record")
      )
      ~~~

      — Additional comment from on 2024-01-16T06:33:23Z

      However, I think the patch in comment #10 might not sufficient to fix the Destination/Disconnected Satellite that already have the corrupted errata (the merged errata with missing/empty pkglist). I think the import task will still try to merge the errata because the corrected pkglist in the new export might not have the same pkglist with the corrupted errata causing the merge advisory to trigger again.

      So we probably need to fix the merge advisory but discovered in comment #8 too.


      Below is the untested patch for the merge advisory that I have so far:
      ~~~
      — a/app/advisory.py 2023-10-14 05:05:24.000000000 +1000
      +++ b/app/advisory.py 2024-01-16 15:37:19.140415753 +1000
      @@ -355,6 +355,28 @@
      added_collections = added_advisory.collections.all()
      references = previous_advisory.references.all()

      + def get_new_name(collection, collection_group):
      + name_list = collection.name.split("_")
      + if name_list[-1].isdecimal():
      + name_list.pop(-1)
      + orig_name = "_".join(name_list)
      + else:
      + orig_name = collection.name
      + if orig_name in collection_group.keys():
      + index = max(collection_group[orig_name]) + 1
      + else:
      + index = 0
      + collection_group.setdefault(orig_name, []).append(index)
      + return f"{orig_name}_{index}"
      +
      + collection_group = {}
      + for collection in chain(previous_collections, added_collections):
      + name_list = collection.name.split("_")
      + if name_list[-1].isdecimal():
      + index = name_list.pop(-1)
      + orig_name = "_".join(name_list)
      + collection_group.setdefault(orig_name, []).append(int(index))
      +
      with transaction.atomic():
      # First thing to do is ensure collection-name-uniqueness
      # in the newly-merged advisory.
      @@ -370,22 +392,22 @@

      # dictionary of collection-name:first-unused-suffix pairs
      # if a collection has no name, we assign it the name "collection" and uniquify-it from there
      - names_seen = {"collection": 0}
      + names_seen = ["collection"]
      +
      + collections_to_merge = set([])
      for collection in chain(previous_collections, added_collections):
      + if collection.packages.count() == 0:
      + continue
      # no-name? When merging, ILLEGAL! Give it a name
      if not collection.name:
      collection.name = "collection"
      + if collection.name in names_seen:
      + collection.name = get_new_name(collection, collection_group)
      + names_seen.append(collection.name)
      + collections_to_merge.add(collection)

      - if collection.name in names_seen.keys():
      - orig_name = collection.name
      - new_name = f"{orig_name}_{names_seen[orig_name]}

      "

      • names_seen[orig_name] += 1
      • collection.name = new_name
      • # if we've not seen it before, store in names-seen as name:0
      • else:
      • names_seen[collection.name] = 0
        merged_advisory_cr = previous_advisory.to_createrepo_c(
      • collections=chain(previous_collections, added_collections)
        + collections=collections_to_merge
        )
        merged_digest = hash_update_record(merged_advisory_cr)
        merged_advisory = previous_advisory
        @@ -404,7 +426,7 @@
      1. For UpdateCollections, make sure we don't re-use the collections for either of the
      2. advisories being merged
        _copy_update_collections_for(
      • merged_advisory, chain(previous_collections, added_collections)
        + merged_advisory, collections_to_merge
        )
        for reference in references:
      1. copy reference and add relation for advisory
        ~~~

      I did a simple test in console. Result as below:
      ~~~
      >>> from pulp_rpm.app.advisory import merge_advisories
      >>> from pulp_rpm.app.models import UpdateRecord
      >>> old = UpdateRecord.objects.get(pk="2e8980de-d250-49c2-957d-3cce940dfc31")
      >>> new = UpdateRecord.objects.get(pk="44645e12-74b3-4954-a8ec-47d41cad209a")
      >>> merged = merge_advisories(old, new)
      >>> merged.collections.all()
      <QuerySet [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default>]>

            1. Try a second merge ####
              >>> old = merged
              >>> new = UpdateRecord.objects.get(pk="2e8980de-d250-49c2-957d-3cce940dfc31")
              >>> merged = merge_advisories(old, new)
              >>> merged.collections.all()
              <QuerySet [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_1>]>
              >>> for coll in merged.collections.all():
              ... print(coll.packages.all())
              ...
              <QuerySet [<UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>, <UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>]>
              <QuerySet [<UpdateCollectionPackage: python3-libs>, <UpdateCollectionPackage: python3-test>, <UpdateCollectionPackage: platform-python>, <UpdateCollectionPackage: python3-libs>]>
              <QuerySet [<UpdateCollectionPackage: python3-libs>, <UpdateCollectionPackage: python3-test>, <UpdateCollectionPackage: platform-python>, <UpdateCollectionPackage: python3-libs>]>
            1. Try to merge the third time ####
              >>> old = merged
              >>> new = UpdateRecord.objects.get(pk="44645e12-74b3-4954-a8ec-47d41cad209a")
              >>>
              >>> merged = merge_advisories(old, new)
              >>> merged.collections.all()
              <QuerySet [<UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_2>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_0>, <UpdateCollection: rhel-8-for-x86_64-baseos-rpms__8_0_default_1>]>
              >>> for coll in merged.collections.all():
              ... print(coll.packages.all())
              ...
              <QuerySet [<UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>, <UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>]>
              <QuerySet [<UpdateCollectionPackage: python3-libs>, <UpdateCollectionPackage: python3-test>, <UpdateCollectionPackage: platform-python>, <UpdateCollectionPackage: python3-libs>]>
              <QuerySet [<UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>, <UpdateCollectionPackage: python3-debuginfo>, <UpdateCollectionPackage: python3-debugsource>]>
              <QuerySet [<UpdateCollectionPackage: python3-libs>, <UpdateCollectionPackage: python3-test>, <UpdateCollectionPackage: platform-python>, <UpdateCollectionPackage: python3-libs>]>
              ~~~

      — Additional comment from on 2024-01-25T12:51:11Z

      Target Milestone is set to Unspecified, since this bug has Target Milestone set to 6.15.0 and approved release flag is not sat-6.15.0+

      Once the bug has been granted the sat-6.15.0+ flag, the Target Milestone can be set to the desired value.

      — Additional comment from on 2024-01-26T20:15:13Z

      Created attachment 2010808
      rpm/3.17 patch for 3380

      I cherry-picked https://github.com/pulp/pulp_rpm/pull/3389 into the rpm/3.17 branch and created the attached patch. This should apply cleanly to a reproducer that has rpm/3.17 installed.

      This change passes all of Pulp's tests. We're holding off on merging/backporting the change, until after it's had a chance to run on, and show that it fixes, the reproducer machine.

      Once we're all sure we're happy with it, this fix will be merged and backported to currently-supported branches of pulp_rpm, and releases cut.

      — Additional comment from on 2024-02-02T22:08:32Z

      Created attachment 2014703
      rpm/3.17 patch for 2821

      This is a backport for the fix for https://github.com/pulp/pulp_rpm/issues/2821. I've applied this on top of/in addition to the patch for 3380, in a core/3.16-rpm/3.17 dev-env, and our pulpimport and advisory tests run green with it in place.

      — Additional comment from on 2024-02-09T16:04:45Z

      The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

      — Additional comment from on 2024-02-09T16:04:48Z

      All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

      — Additional comment from on 2024-02-11T00:46:31Z

      I managed to get a backup data from one customer hitting this issue and restored their backup for investigation.

      In their case, they are hitting the conflict in "advisory_id_conflicts["in_added"]" because 2 errata contents having the same upstream_id (see below). Therefore 2 of those errata are adding to the new version during the import process which causing the conflict and merging.

      ~~~
      pulpcore=# select * from core_content where pulp_id in (select content_ptr_id from rpm_updaterecord where id = 'RHBA-2023:5354');
      pulp_id | pulp_created | pulp_last_updated | pulp_type | upstream_id | timestamp_of_interest
      -----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------
      5790910f-5b86-49cb-9d39-bc91db22fee1 | 2024-01-02 15:01:48.22628-05 | 2024-01-02 15:01:48.226292-05 | rpm.advisory | ffc701c8-6e4a-4002-a34f-c8cd5f50d745 | 2024-01-02 15:01:48.226302-05
      bcaca62b-f298-4ab3-b831-9911120fc7a4 | 2023-12-04 16:23:42.919924-05 | 2024-02-10 00:18:08.764203-05 | rpm.advisory | ffc701c8-6e4a-4002-a34f-c8cd5f50d745 | 2024-02-10 00:18:08.764209-05
      e9da294b-d7fa-4c6a-b8ae-532dc5b89597 | 2024-01-02 14:30:20.078201-05 | 2024-01-02 14:30:20.078213-05 | rpm.advisory | 65768731-76e1-4212-8fd3-a298c4d05727 | 2024-02-02 15:50:56.464661-05
      (3 rows)
      ~~~

      Why are they having same upstream id? As we can see below one of those has empty pkglist and another one was merged with the first one.
      ~~~
      pulpcore=# select uc.pulp_created, uc.name, ucp.filename, ucp.sum from rpm_updatecollection uc left join rpm_updatecollectionpackage ucp on ucp.update_collection_id = uc.pulp_id where uc.update_record_id = 'bcaca62b-f298-4ab3-b831-9911120fc7a4';
      pulp_created | name | filename | sum
      --------------------------------------------------------------------------------+----
      2023-12-04 16:24:43.659379-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default | |
      (1 row)

      pulpcore=# select uc.pulp_created, uc.name, ucp.filename, ucp.sum from rpm_updatecollection uc left join rpm_updatecollectionpackage ucp on ucp.update_collection_id = uc.pulp_id where uc.update_record_id = '5790910f-5b86-49cb-9d39-bc91db22fee1';
      pulp_created | name | filename | sum
      ----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------
      2024-01-02 15:01:48.228782-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default | |
      2024-01-02 15:01:48.230796-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default_0 | sos-4.6.0-2.el8.noarch.rpm | 958f91d4dc04f94295f60ce40d2e0633850ac8c98dbf3815b2473a1d446a2759
      2024-01-02 15:01:48.230796-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default_0 | sos-audit-4.6.0-2.el8.noarch.rpm | 6cde9203d87982cddd8de888dd00b915cd7704dae3649f78b7474042122e3512
      (3 rows)
      ~~~

      We can also see the packages are associated to 2 errata.
      ~~~
      select ur.id, uc.name from rpm_updatecollectionpackage ucp left join rpm_updatecollection uc on uc.pulp_id = ucp.update_collection_id left join rpm_updaterecord ur on ur.content_ptr_id = uc.update_record_id where filename = 'sos-4.6.0-2.el8.noarch.rpm';
      id | name
      ---------------+---------------------------------------------
      RHBA-2023:7075 | rhel-8-for-x86_64-baseos-rpms__8_0_default
      RHBA-2023:5354 | rhel-8-for-x86_64-baseos-rpms__8_0_default
      RHBA-2023:7075 | rhel-8-for-x86_64-baseos-rpms__8_0_default
      RHBA-2023:5354 | rhel-8-for-x86_64-baseos-rpms__8_0_default_0
      RHBA-2023:7075 | rhel-8-for-x86_64-baseos-rpms__8_0_default
      ~~~

      but in export data shows only one "sos-4.6.0-2.el8.noarch.rpm" record associating to "RHBA-2023:7075"
      ~~~

      1. grep -B 4 -A 4 "sos-4.6.0-2.el8.noarch.rpm" pulp_rpm.app.modelresourceUpdateCollectionPackageResource.json
        {
        "update_collection": "rhel-8-for-x86_64-baseos-rpms__8_0_default|972675bdf91638a09418054629c5a8daadc18e11d6c805dcdc4e97980d11387d",
        "arch": "noarch",
        "epoch": "0",
        "filename": "sos-4.6.0-2.el8.noarch.rpm",
        "name": "sos",
        "reboot_suggested": "0",
        "relogin_suggested": "0",
        "restart_suggested": "0",
        ...
      1. grep -B 17 "972675bdf91638a09418054629c5a8daadc18e11d6c805dcdc4e97980d11387d" pulp_rpm.app.modelresource.UpdateRecordResource.json
        "timestamp_of_interest": "2024-02-02 07:42:14",
        "id": "RHBA-2023:7075",
        "updated_date": "2023-11-14 14:43:11",
        "description": "For detailed information on changes in this release, see the Red Hat Enterprise Linux 8.9 Release Notes linked from the References section.",
        "issued_date": "2023-11-14 08:44:33",
        "fromstr": "release-engineering@redhat.com",
        "status": "final",
        "title": "sos bug fix and enhancement update",
        "summary": "An update for sos is now available for Red Hat Enterprise Linux 8.",
        "version": "5",
        "type": "bugfix",
        "severity": "None",
        "solution": "Before applying this update, make sure all previously released errata\nrelevant to your system have been applied.\n\nFor details on how to apply this update, refer to:\n\nhttps://access.redhat.com/articles/11258",
        "release": "0",
        "rights": "Copyright 2023 Red Hat Inc",
        "reboot_suggested": "0",
        "pushcount": "4",
        "digest": "972675bdf91638a09418054629c5a8daadc18e11d6c805dcdc4e97980d11387d"
        ~~~

      After applying the fix and re-import the same export data successfully. The merge result of the affected erratum is:
      ~~~
      pulpcore=# select r.name, added.number as added_version, removed.number as removed_version, added.pulp_created as added_date, removed.pulp_created as removed_date from core_repositorycontent rc left join core_repository r on r.pulp_id = rc.repository_id left join core_repositoryversion added on added.pulp_id = rc.version_added_id left join core_repositoryversion removed on removed.pulp_id = rc.version_removed_id where content_id = 'decc1d58-e478-4fab-81e5-c88384569b8b' order by r.name, added.number;
      name | added_version | removed_version | added_date | removed_date
      -------------------------------------------------------------------------------------------------------------------------------------
      Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8-18176451 | 6 | | 2024-02-10 19:09:11.430332-05 |
      Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8-18198889 | 5 | | 2024-02-10 12:34:22.192214-05 |
      (2 rows)

      pulpcore=# select uc.pulp_created, uc.name, ucp.filename, ucp.sum from rpm_updatecollection uc left join rpm_updatecollectionpackage ucp on ucp.update_collection_id = uc.pulp_id where uc.update_record_id = 'decc1d58-e478-4fab-81e5-c88384569b8b';
      pulp_created | name | filename | sum
      ----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------
      2024-02-10 12:34:51.343453-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default_0 | sos-4.6.0-2.el8.noarch.rpm | 958f91d4dc04f94295f60ce40d2e0633850ac8c98dbf3815b2473a1d446a2759
      2024-02-10 12:34:51.343453-05 | rhel-8-for-x86_64-baseos-rpms__8_0_default_0 | sos-audit-4.6.0-2.el8.noarch.rpm | 6cde9203d87982cddd8de888dd00b915cd7704dae3649f78b7474042122e3512
      (2 rows)
      ~~~

      Based on this data, I think we can pretty confident that we have found and fix the root cause.

      @dalley @ggainey Let me know if you guys are interested to access the reproducer to check futher.

      — Additional comment from on 2024-02-21T21:15:44Z

      Created attachment 2018063
      RHEL 8 Hotfix RPM for Satellite 6.11.5.6

      INSTALL INSTRUCTIONS (Satellite 6.11.5.6 on RHEL8):

      1. Take a complete backup or snapshot of Satellite 6.11.5.6 server

      2. Obtain the Hotfix RPM from this attachment

      3. # dnf install ./python38-pulp-rpm-3.17.22-1.HOTFIXRHBZ2253381.el8pc.noarch.rpm --disableplugin=foreman-protector

      4. # satellite-maintain service restart

      — Additional comment from on 2024-02-21T21:16:27Z

      RHEL 7 hotfix RPM is coming soon for 6.11.5.6 – needs an install test.

      — Additional comment from on 2024-02-21T21:21:54Z

      Created attachment 2018064
      RHEL 7 Hotfix RPM for Satellite 6.11.5.6

      INSTALL INSTRUCTIONS (Satellite 6.11.5.6 on RHEL7):

      1. Take a complete backup or snapshot of Satellite 6.11.5.6 server

      2. Obtain the Hotfix RPM from this attachment

      3. # yum install ./tfm-pulpcore-python3-pulp-rpm-3.17.22-1.HOTFIXRHBZ2253381.el7pc.noarch.rpm --disableplugin=foreman-protector

      4. # satellite-maintain service restart

      QE Tracker for https://issues.redhat.com/browse/SAT-23579
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2266139

              rhn-support-sganar Shubham Ganar
              satellite-focaccia-bot Focaccia Bot
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: