Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1325

Expose input tokens per-request in LTS payload

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Obsolete
    • Icon: Undefined Undefined
    • Jan 13
    • None
    • None
    • None
    • 2

      User Story:
      As a performance engineer

      I want to include the input token length and distinguish them from output token length in the LTS payload and KPIs

      So that we can analyze the relationship between TTFT and input tokens. For instance, we could graph input_tokens vs TTFT for all requests, colored by concurrency. This is a relevant variable that we are not currently analyzing

      Acceptance criteria:

              Unassigned Unassigned
              dagray@redhat.com David Whyte-Gray
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: