Uploaded image for project: 'Subscription Watch'
  1. Subscription Watch
  2. SWATCH-4389

Tally Products API times out for customers with large datasets

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • swatch-api
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • subs-swatch-thunder

      The /v1/tally/products/<PRODUCT_ID>/<METRIC_ID> endpoint is timing out for customers with large amounts of data, causing 500 errors and degraded user experience.

      We've identified that the query joining tally_snapshots with tally_measurements exceeds the PostgreSQL statement timeout for organizations with significant data volumes (e.g., with ~890K tally_snapshots records). The gateway returns a 009 status code (NGINX client-closed connection) after 30 seconds, while the database eventually cancels the query due to statement timeout.

      The problematic query is:

      SELECT DISTINCT ts1_0.id, ts1_0.billing_account_id, ts1_0.billing_provider, 
             ts1_0.granularity, ts1_0.org_id, ts1_0.product_id, ts1_0.sla, 
             ts1_0.snapshot_date, tm1_0.snapshot_id, tm1_0.measurement_type, 
             tm1_0.metric_id, tm1_0.value, ts1_0.usage 
      FROM tally_snapshots ts1_0 
      LEFT JOIN tally_measurements tm1_0 ON ts1_0.id = tm1_0.snapshot_id 
      WHERE ts1_0.org_id = ? 
        AND ts1_0.product_id = ? 
        AND ts1_0.granularity = ? 
        AND ts1_0.sla = ? 
        AND ts1_0.usage = ? 
        AND ts1_0.billing_provider = ? 
        AND ts1_0.billing_account_id = ? 
        AND ts1_0.snapshot_date BETWEEN ? AND ? 
      ORDER BY ts1_0.snapshot_date
      

      Data volume for affected organization:

      • tally_snapshots: 890,082 records
      • tally_measurements: 2,464,843 records
      • hosts: 70,795 records
      • host_tally_buckets: 115,800 records

      Acceptance Criteria

      • Reproduce the issue with similar amount of data and improve this API to fix the performance issues that cause the timeout.

              Unassigned Unassigned
              jcarvaja@redhat.com Jose Carvajal Hilario
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: