Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: swatch-api
Labels:
None

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
AssignedTeam:
subs-swatch-lightning
Intelligence Requested:
Market:

Sprint:
Swatch Lightning Sprint 1

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

When we hit the API "/v1/instances/billing_account_ids", the internal SQL query that is being executed looks like:

select distinct htb1_0.product_id,htb1_0.billing_provider,htb1_0.billing_account_id 
from host_tally_buckets htb1_0 
join hosts h1_0 on h1_0.id=htb1_0.host_id 
where h1_0.org_id=? and htb1_0.billing_provider<>'_ANY' and htb1_0.billing_account_id<>'_ANY' and h1_0.last_seen>=? and htb1_0.product_id='rosa' 
order by htb1_0.billing_provider,htb1_0.billing_account_id

And the cost of running this query having a data set of around 1M of rows in hosts and host_tally_buckets is:

Unique  (cost=11912.15..11926.81 rows=118 width=28)
  ->  Gather Merge  (cost=11912.15..11926.22 rows=118 width=28)
        Workers Planned: 2
        ->  Unique  (cost=10912.13..10912.57 rows=59 width=28)
              ->  Sort  (cost=10912.13..10912.28 rows=59 width=28)
                    Sort Key: htb1_0.billing_provider, htb1_0.billing_account_id
                    ->  Nested Loop  (cost=374.31..10910.39 rows=59 width=28)
                          ->  Parallel Bitmap Heap Scan on host_tally_buckets htb1_0  (cost=373.89..8351.96 rows=597 width=44)
                                Recheck Cond: ((product_id)::text = 'rosa'::text)
                                Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text))
                                ->  Bitmap Index Scan on idx_buckets_pid  (cost=0.00..373.53 rows=33747 width=0)
                                      Index Cond: ((product_id)::text = 'rosa'::text)
                          ->  Index Only Scan using idx_hosts_view on hosts h1_0  (cost=0.42..4.29 rows=1 width=16)
                                Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))

The main issue with the query is that it needs to loop over 33747 records to filter out only 597. It's actually using the index idx_buckets_pid, but this index only has the product_id.

To improve the query, we can modify the existing index idx_buckets_pid to include more fields:

<createIndex indexName="idx_buckets_pid" tableName="host_tally_buckets">
    <column name="product_id"/>
    <column name="billing_provider"/>
    <column name="billing_account_id"/>
    <column name="host_id"/>
</createIndex>

If we replace it, then the new cost would look like as:

Unique  (cost=1000.86..5577.85 rows=83 width=28)
  ->  Gather Merge  (cost=1000.86..5577.44 rows=83 width=28)
        Workers Planned: 1
        ->  Unique  (cost=0.85..4568.09 rows=83 width=28)
              ->  Nested Loop  (cost=0.85..4567.68 rows=83 width=28)
                    ->  Parallel Index Only Scan using idx_buckets_pid on host_tally_buckets htb1_0  (cost=0.42..959.30 rows=842 width=44)
                          Index Cond: (product_id = 'rosa'::text)
                          Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text))
                    ->  Index Only Scan using idx_hosts_view on hosts h1_0  (cost=0.42..4.29 rows=1 width=16)
                          Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))

As we can see, the index is now being used to filter also the billing_provider, and billing_account_id, so the cost is much more efficient.

Acceptance Criteria

Migration script with the required SQL index change.
No performance regression is expected.
How long it takes for migration script with large data set
Engage performance team (will not be a blocker, not an assessment) on regression testing

relates to

SWATCH-3960 Not seeing the log traces for swatch-api-nginx-proxy in Splunk

Backlog

SWATCH-3988 Investigate how to improve the performance performance of the "/v1/instances/billing_account_ids" API

Closed

Assignee:: Jose Carvajal Hilario

Reporter:: Jose Carvajal Hilario

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/09/15 5:26 PM

Updated:: 2025/10/16 5:51 PM

Resolved:: 2025/09/25 9:00 AM

Details

Description

Acceptance Criteria

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates