Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-2042

build DeepEP for llm-d cluster scale MoE serving

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Accelerator Enablement
    • llm-d: build DeepEP
    • True
    • Hide

      Legal issue is resolved 
      Legal and license issue NVSHMEM is under a proprietary license, https://docs.nvidia.com/nvshmem/api/sla.html .

      There is no pyproject.toml to declare build dependencies such as torch, making it difficult to determine the required environment for building or installing the project.

      Moreover, the Git repository lacks both version tags and formal releases, which makes it impossible to pin the project to a specific version with any reliability. This significantly complicates dependency management and reproducibility.

      Show
      Legal issue is resolved   Legal and license issue NVSHMEM is under a proprietary license,  https://docs.nvidia.com/nvshmem/api/sla.html  . There is no pyproject.toml to declare build dependencies such as torch , making it difficult to determine the required environment for building or installing the project. Moreover, the Git repository lacks both version tags and formal releases, which makes it impossible to pin the project to a specific version with any reliability. This significantly complicates dependency management and reproducibility.
    • False
    • In Progress
    • AIPCC-3181 - Support for llm-d
    • AIPCC-3181Support for llm-d
    • 0% To Do, 0% In Progress, 100% Done
    • Hide

      27/Aug/25 - Red
      Due to "deep_ep_cpp.cpython-312-x86_64-linux-gnu.so: missing symbol" error[AIPCC-4739]

      29/July/25 - Green
      In review stage

      23/July/25 - Orange
      Legal and license issue cleared, existing RPM to be used.
      11/July/25 - Red
      Legal and license issue for usage of NVSHMEM for package building as its is under a proprietary license, https://docs.nvidia.com/nvshmem/api/sla.html. Reporter of the issue is informed

      Show
      27/Aug/25 - Red Due to "deep_ep_cpp.cpython-312-x86_64-linux-gnu.so: missing symbol" error [AIPCC-4739] 29/July/25 - Green In review stage 23/July/25 - Orange Legal and license issue cleared, existing RPM to be used. 11/July/25 - Red Legal and license issue for usage of NVSHMEM for package building as its is under a proprietary license, https://docs.nvidia.com/nvshmem/api/sla.html . Reporter of the issue is informed

      Feature Overview (mandatory - Complete while in New status)

      DeepEP is required to be able to do cluster scale expert parallel serving, which is relevant for llm-d serving of deepseek expert models. It needs to be built and available for inclusion in an image variant.

      Build instructions: https://github.com/deepseek-ai/DeepEP/blob/main/third-party/README.md

      Goals (mandatory - Complete while in New status)
      Provide high-level goal statement, providing user context and expected user outcome(s) for this Feature

      • Build deepEP as a wheel

       

      Requirements (mandatory -_ Complete while in Refinement status):
      A list of specific needs, capabilities, or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the Feature shifts. If a non MVP requirement slips, it does not shift the feature.

      Requirement Notes isMVP?
      Wheel build   Yes
      Builder image update   Yes

       

      Done - Acceptance Criteria (mandatory - Complete while in Refinement status):
      Acceptance Criteria articulates and defines the value proposition - what is required to meet the goal and intent of this Feature. The Acceptance Criteria provides a detailed definition of scope and the expected outcomes - from a users point of view

      A wheel collection owner can add a supported version of nixl to their collection. Runtime image is updated to support minimal working config deepEP.

      Use Cases - i.e. User Experience & Workflow: (Initial completion while in Refinement status):

      vllm-d builds will include this package

      Out of Scope _{}(Initial completion while in Refinement status):{_}
      High-level list of items or persona’s that are out of scope.
      <your text here>

      Documentation Considerations _{}(Initial completion while in Refinement status):{_}
      Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation..
      <your text here>

       

      Questions to Answer _{}(Initial completion while in Refinement status):{_}
      Include a list of refinement / architectural questions that may need to be answered before coding can begin.
      <your text here>

      Background and Strategic Fit (Initial completion while in Refinement status):
      Provide any additional context is needed to frame the feature.

      https://github.com/deepseek-ai/DeepEP

      Customer Considerations _{}(Initial completion while in Refinement status):{_}
      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.
      <your text here>

      Team Sign Off (Completion while in Refinement status)

      • All required Epics (known at the time) are linked to the this Feature
      • All required Stories, Tasks (known at the time) for the most immediate Epics have been created and estimated
      • Add - Reviewers name, Team Name
      • Acceptance == Feature as “Ready” - well understood and scope is clear - Acceptance Criteria (scope) is elaborated, well defined, and understood
      • Note: Only set FixVersion/s: on a Feature if the delivery team agrees they have the capacity and have committed that capability for that milestone

      *An engineer or tech lead from the product requesting this feature is required for the signoff below.

      Reviewed By Team Name Accepted Notes
             
             
             
             

       

              rh-ee-vshaw Vikash Shaw
              rhn-support-weaton Will Eaton
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: