-
Bug
-
Resolution: Won't Do
-
Critical
-
None
-
None
-
None
-
5
-
False
-
-
False
-
subs-swatch-lightning
-
-
-
Swatch Lightning Sprint 1
When we hit the API "/v1/instances/billing_account_ids", the internal SQL query that is being executed looks like:
select distinct htb1_0.product_id,htb1_0.billing_provider,htb1_0.billing_account_id from host_tally_buckets htb1_0 join hosts h1_0 on h1_0.id=htb1_0.host_id where h1_0.org_id=? and htb1_0.billing_provider<>'_ANY' and htb1_0.billing_account_id<>'_ANY' and h1_0.last_seen>=? and htb1_0.product_id='rosa' order by htb1_0.billing_provider,htb1_0.billing_account_id
And the cost of running this query having a data set of around 1M of rows in hosts and host_tally_buckets is:
Unique (cost=11912.15..11926.81 rows=118 width=28) -> Gather Merge (cost=11912.15..11926.22 rows=118 width=28) Workers Planned: 2 -> Unique (cost=10912.13..10912.57 rows=59 width=28) -> Sort (cost=10912.13..10912.28 rows=59 width=28) Sort Key: htb1_0.billing_provider, htb1_0.billing_account_id -> Nested Loop (cost=374.31..10910.39 rows=59 width=28) -> Parallel Bitmap Heap Scan on host_tally_buckets htb1_0 (cost=373.89..8351.96 rows=597 width=44) Recheck Cond: ((product_id)::text = 'rosa'::text) Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text)) -> Bitmap Index Scan on idx_buckets_pid (cost=0.00..373.53 rows=33747 width=0) Index Cond: ((product_id)::text = 'rosa'::text) -> Index Only Scan using idx_hosts_view on hosts h1_0 (cost=0.42..4.29 rows=1 width=16) Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))
The main issue with the query is that it needs to loop over 33747 records to filter out only 597. It's actually using the index idx_buckets_pid, but this index only has the product_id.
To improve the query, we can modify the existing index idx_buckets_pid to include more fields:
<createIndex indexName="idx_buckets_pid" tableName="host_tally_buckets"> <column name="product_id"/> <column name="billing_provider"/> <column name="billing_account_id"/> <column name="host_id"/> </createIndex>
If we replace it, then the new cost would look like as:
Unique (cost=1000.86..5577.85 rows=83 width=28) -> Gather Merge (cost=1000.86..5577.44 rows=83 width=28) Workers Planned: 1 -> Unique (cost=0.85..4568.09 rows=83 width=28) -> Nested Loop (cost=0.85..4567.68 rows=83 width=28) -> Parallel Index Only Scan using idx_buckets_pid on host_tally_buckets htb1_0 (cost=0.42..959.30 rows=842 width=44) Index Cond: (product_id = 'rosa'::text) Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text)) -> Index Only Scan using idx_hosts_view on hosts h1_0 (cost=0.42..4.29 rows=1 width=16) Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))
As we can see, the index is now being used to filter also the billing_provider, and billing_account_id, so the cost is much more efficient.
Acceptance Criteria
- Migration script with the required SQL index change.
- No performance regression is expected.
- How long it takes for migration script with large data set
- Engage performance team (will not be a blocker, not an assessment) on regression testing
- relates to
-
SWATCH-3960 Not seeing the log traces for swatch-api-nginx-proxy in Splunk
-
- Backlog
-
-
SWATCH-3988 Investigate how to improve the performance performance of the "/v1/instances/billing_account_ids" API
-
- Review
-