Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-3162

Support Multi-Configuration Performance Testing for a Single Model

    • Icon: Initiative Initiative
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • None
    • Model Validation
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Description:{}

      Introduce the ability to benchmark the performance of a single deployed model across multiple different GenAI engine configurations—starting with vLLM.

       

      The system should support automated benchmarking of the same model while varying a range of engine-specific parameters, including but not limited to:

      • max_batch_size
      • max_tokens
      • gpu_memory_utilization
      • tensor_parallel_size
      • disable_custom_all_reduce
      • kv_cache_dtype
      • enable_prefix_caching
      • trust_remote_code
      • gpu_lazy_init
      • max_model_len
      • max_context_len_to_capture
      • sliding_window
      • num_experts (for MoE models)
      • cllm_paged_attention (if supported)
      • engine_version (to allow version comparison)

       

      The user (e.g.,mle) should be able to:

      • Configure and launch multiple benchmark runs with different configurations
      • Include full metadata for each run (model, config, hardware, workload, etc.)
      • Easily compare results across configuration variants

      This will support deeper analysis of configuration tradeoffs and assist product teams in selecting optimal deployment settings.

              rh-ee-abadli Aviran Badli (Inactive)
              rh-ee-abadli Aviran Badli (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: