Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 6.15.0
Affects Version/s: 6.12.4
Component/s: Hosts
Labels:
- team-triaged
- triaged

Story Points:
0
Blocked:
False
Bugzilla Bug:
RHBZ: 2203077
Severity:
Moderate
Sprint:
Sprint 123, Sprint 124, Sprint 125, Sprint 126, Sprint 127, Sprint 128, Sprint 129, Sprint 130, Sprint 131, Sprint 132
AssignedTeam:
sat-proton

Release Note Type:
None
Release Note Text:
None
Release Note Status:
None

PX Impact Score:
PX Priority Data:
SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Test Coverage:
None

Market:

Description of problem:
When having a Content Host with many dnf modulestreams installed, Generate Applicability task can be very slow (esp. on scaled Satellites with many CVs with many big repos, i.e. with huge tables like katello_repository_rpms and namely katello_installed_packages).

Postgres logs slow queries in form:

2023-05-08 15:45:00 CEST LOG: duration: 796130.755 ms execute <unnamed>: SELECT "katello_rpms"."id" FROM "katello_rpms" INNER JOIN katello_repository_rpms ON
katello_rpms.id = katello_repository_rpms.rpm_id INNER JOIN katello_installed_packages ON
katello_rpms.name = katello_installed_packages.name AND
katello_rpms.arch = katello_installed_packages.arch AND
katello_rpms.evr > katello_installed_packages.evr AND
katello_installed_packages.id in (SELECT DISTINCT ON (katello_installed_packages.name) katello_installed_packages.id FROM katello_installed_packages INNER JOIN katello_host_installed_packages ON katello_installed_packages.id = katello_host_installed_packages.installed_package_id WHERE katello_host_installed_packages.host_id = 11603 ORDER BY katello_installed_packages.name, katello_installed_packages.evr DESC) LEFT JOIN katello_module_stream_rpms ON
katello_rpms.id = katello_module_stream_rpms.rpm_id INNER JOIN katello_host_installed_packages ON
katello_installed_packages.id = katello_host_installed_packages.installed_package_id WHERE (katello_repository_rpms.repository_id in (21041,21040,21472,21076,34244)) AND (katello_host_installed_packages.host_id = 11603) AND ((katello_module_stream_rpms.module_stream_id IS NULL AND
katello_installed_packages.id NOT IN (SELECT "katello_installed_packages"."id" FROM "katello_installed_packages" WHERE "katello_installed_packages"."nvra" IN (......
.... here is a huge list of RPMs that repeat often...
......) AND "katello_installed_packages"."epoch" IN (.....
..... another huge list of numbers, with huge repetition of '0' or '1'))) OR

(now a similar huge query).

The "nvra IN AND epoch in" query can have 500k characters.

See the duration on a scaled Satellite, which makes timeouts to the calculation itself and also affects QoS of foreman/postgres/tasks, which are busy by that work (from multiple Content Hosts).

One simple improvement:

The query is called from https://github.com/Katello/katello/blob/master/app/services/katello/applicability/applicable_content_helper.rb#L82 :

def fetch_rpm_content_ids
..
where("(katello_module_stream_rpms.module_stream_id IS NULL AND
katello_installed_packages.id NOT IN (:locked_modular_installed_packages)) OR
(katello_module_stream_rpms.module_stream_id IN (:enabled_module_streams)
AND katello_installed_packages.id IN (:locked_modular_installed_packages))",
:enabled_module_streams => enabled_module_stream_ids,
:locked_modular_installed_packages => locked_modular_installed_packages(enabled_module_stream_ids)).pluck(:id).uniq

The huge list with duplicates is populated from locked_modular_installed_packages method, which is:

Installed packages that are locked for the host due to enabled module stream membership
def locked_modular_installed_packages(enabled_module_streams)
rpms_in_enabled_module_streams = ::Katello::Rpm.
joins("INNER JOIN katello_module_stream_rpms ON katello_rpms.id = katello_module_stream_rpms.rpm_id").
where("katello_module_stream_rpms.module_stream_id IN (:enabled_module_streams)",
:enabled_module_streams => enabled_module_streams).select(:nvra, :epoch)

::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra),
epoch: rpms_in_enabled_module_streams.map(&:epoch)).select(:id)
end

Here, rpms_in_enabled_module_streams (big in our scenario for a Host with many modulestreams installed) is a list of pairs ('foo-1.2.3', '0'), ('bar-3.2.1', '0'), .. where the packages can easily repeat (e.g. netcf-0.2.8-12.module+el8.1.0+4066+0f1aadab.x86_64 repeated 28 times, '0' epoch repeated 4885times).

And the

nvra: rpms_in_enabled_module_streams.map(&:nvra)

clause passes the non-uniq list to psql.

Adding there ".uniq" twice:

::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra).uniq,
epoch: rpms_in_enabled_module_streams.map(&:epoch).uniq).select(:id)

shrinks the query substantially, while it has no impact to semantics of the query. see Additional Details, I smell a bug here.

Repeated tests by adding this improvement shows 1/3 improvement in psql query duration (some small time is additionally spent on the "uniq" call but that is max few percents).

Version-Release number of selected component (if applicable):
Sat6.12 (also 6.11 or 6.13 affected)

How reproducible:
100% on a scaled Satellite

Steps to Reproduce:
1. Have many Content Hosts registered to Satellite, with many installed packages each
2. Have more Content Views with bigger repos published to multiple LEs
3. Have a Content Host (el8/9) with installed multiple modulestreams - the more the better
4. Optionally, modify in /var/lib/pgsql/data/postgresql.conf the "log_min_duration_statement 1000" to a lower value (depends how much you scale)
5. Recalculate applicability of the Host (hammer host errata recalculate --host-id 123)

Actual results:
Depending how much you scale, the "Bulk generate applicability for host" task can last very long (on unscaled Sat with just many modulestreams installed, 6 seconds instead of 0-1, customer has 30 minutes).

postgres logs show big duration of the above queries

Expected results:
Reasonably lower times - e.g. just applying the ".uniq", I got 1/3 lower psql durations and similarly shorter tasks execution.

Additional info:
I smell a bug in the code that can cause wrong applicability calculation. rpms_in_enabled_module_streams is a list of NEVRAs of packages enabled by a module stream. BUT the query:

::Katello::InstalledPackage.where(nvra: rpms_in_enabled_module_streams.map(&:nvra),
epoch: rpms_in_enabled_module_streams.map(&:epoch)).select(:id)

applies the "where the NEVRA matches" wrongly / too vaguely. See example:

rpms_in_enabled_module_streams = (
('foo-1.2.3', '0'),
('bar-3.2.1', '1')
)

and ::Katello::InstalledPackage.select(:nvra, :epoch) = (
..
('foo-1.2.3', '0'),
('foo-1.2.3', '1'),
('bar-3.2.1', '0'),
('bar-3.2.1', '1'),
..
)

Then, the "where" clause will ask for "give me packages with nvra IN ('foo-1.2.3', 'bar-3.2.1') and with epoch IN ('0', '1')", whoch returns all four records above - BUT the intention is to return just the two from rpms_in_enabled_module_streams!

I dont know how theoretical or real counterexample this is, though. BUT if it could be real, it would be very tricky to reveal such a bug.

external trackers

Foreman Issue Tracker 36850

Red Hat Errata Tool 129857

Red Hat Issue Tracker SAT-21377

Red Hat Knowledge Base (Solution) 7033387

Red Hat Product Errata RHSA-2024:2010

1.	[DEV] Slow generate applicability for Hosts with multiple modulestreams installed		Closed		Jeremy Lenz
2.	[QE] Slow generate applicability for Hosts with multiple modulestreams installed		Closed		Vijaykumar Sawant

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates