Uploaded image for project: 'Subscription Watch'
  1. Subscription Watch
  2. SWATCH-3961

Improve performance of the "/v1/instances/billing_account_ids" API

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • None
    • swatch-api
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • subs-swatch-lightning
    • Swatch Lightning Sprint 1

      When we hit the API "/v1/instances/billing_account_ids", the internal SQL query that is being executed looks like:

      select distinct htb1_0.product_id,htb1_0.billing_provider,htb1_0.billing_account_id 
      from host_tally_buckets htb1_0 
      join hosts h1_0 on h1_0.id=htb1_0.host_id 
      where h1_0.org_id=? and htb1_0.billing_provider<>'_ANY' and htb1_0.billing_account_id<>'_ANY' and h1_0.last_seen>=? and htb1_0.product_id='rosa' 
      order by htb1_0.billing_provider,htb1_0.billing_account_id 
      

      And the cost of running this query having a data set of around 1M of rows in hosts and host_tally_buckets is:

      Unique  (cost=11912.15..11926.81 rows=118 width=28)
        ->  Gather Merge  (cost=11912.15..11926.22 rows=118 width=28)
              Workers Planned: 2
              ->  Unique  (cost=10912.13..10912.57 rows=59 width=28)
                    ->  Sort  (cost=10912.13..10912.28 rows=59 width=28)
                          Sort Key: htb1_0.billing_provider, htb1_0.billing_account_id
                          ->  Nested Loop  (cost=374.31..10910.39 rows=59 width=28)
                                ->  Parallel Bitmap Heap Scan on host_tally_buckets htb1_0  (cost=373.89..8351.96 rows=597 width=44)
                                      Recheck Cond: ((product_id)::text = 'rosa'::text)
                                      Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text))
                                      ->  Bitmap Index Scan on idx_buckets_pid  (cost=0.00..373.53 rows=33747 width=0)
                                            Index Cond: ((product_id)::text = 'rosa'::text)
                                ->  Index Only Scan using idx_hosts_view on hosts h1_0  (cost=0.42..4.29 rows=1 width=16)
                                      Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))
      

      The main issue with the query is that it needs to loop over 33747 records to filter out only 597. It's actually using the index idx_buckets_pid, but this index only has the product_id.

      To improve the query, we can modify the existing index idx_buckets_pid to include more fields:

      <createIndex indexName="idx_buckets_pid" tableName="host_tally_buckets">
          <column name="product_id"/>
          <column name="billing_provider"/>
          <column name="billing_account_id"/>
          <column name="host_id"/>
      </createIndex>
      

      If we replace it, then the new cost would look like as:

      Unique  (cost=1000.86..5577.85 rows=83 width=28)
        ->  Gather Merge  (cost=1000.86..5577.44 rows=83 width=28)
              Workers Planned: 1
              ->  Unique  (cost=0.85..4568.09 rows=83 width=28)
                    ->  Nested Loop  (cost=0.85..4567.68 rows=83 width=28)
                          ->  Parallel Index Only Scan using idx_buckets_pid on host_tally_buckets htb1_0  (cost=0.42..959.30 rows=842 width=44)
                                Index Cond: (product_id = 'rosa'::text)
                                Filter: (((billing_provider)::text <> '_ANY'::text) AND ((billing_account_id)::text <> '_ANY'::text))
                          ->  Index Only Scan using idx_hosts_view on hosts h1_0  (cost=0.42..4.29 rows=1 width=16)
                                Index Cond: ((org_id = '6340056'::text) AND (id = htb1_0.host_id) AND (last_seen >= '2024-06-28 00:00:00-04'::timestamp with time zone))
      

      As we can see, the index is now being used to filter also the billing_provider, and billing_account_id, so the cost is much more efficient.

      Acceptance Criteria

      • Migration script with the required SQL index change.
      • No performance regression is expected.
      • How long it takes for migration script with large data set
      • Engage performance team (will not be a blocker, not an assessment) on regression testing

              jcarvaja@redhat.com Jose Carvajal Hilario
              jcarvaja@redhat.com Jose Carvajal Hilario
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: