Uploaded image for project: 'Insights Experiences'
  1. Insights Experiences
  2. HMS-1800

Not hitting our SLO ProvisioningHTTPSuccessRate for sources Azure upload_info

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Normal Normal
    • None
    • None
    • Provisioning
    • None
    • EnVision Sprint 28
    • 8
    • None

      We do two calls into Azure for the upload_info endpoint:

      Get the TenantID - static information that never changes for a single source.
      Get the resource groups list - can change.

      Both calls are about 300ms on our AWS cluster so we exceed our 500ms target.

      While the proper solution would be probably split this endpoint into two separate ones, creating a cache entry "AccountInfo" that would hold additional read-only information about hyperscalers accounts (customer name, account id/uuid, additional details which do not change) could be pretty useful.

      This way we would speed up the Azure call and also other similar calls (e.g. AWS upload_info). The expiration time could be set to one month (so unused sources are eventually deleted from Redis at some point), frequent users would benefit the speedup.

      On the other hand, the worst case scenario can cause still not to reach the SLO target (new and new customers trying the feature). Then we would need to split the endpoint but we would continue using the cache.

              rhn-engineering-lzapletal Lukáš Zapletal
              rhn-engineering-lzapletal Lukáš Zapletal
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: