Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-11506

Spyre on Power defect in model-cache RPMs

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Accelerator Enablement
    • True
    • Hide

      Awaiting new RPMs from IBM

      Show
      Awaiting new RPMs from IBM
    • False
    • AIPCC Accelerators 27

      Description of problem:

          It was discovered that the ppc64le model-cache RPM v1.1.1 delivered to Red Hat was built without the enablement of chunked prefill, so the model-cache is not compatible with the RC1 image for RHAIIS 3.4EA1

      Version numbers (base image, wheels, builder, etc):

      RHAIIS images:

      registry.gitlab.com/redhat/rhel-ai/rhaiis/containers/rhaiis-spyre-ubi9-ppc64le:ci_400
      
      quay.io/aipcc/rhaiis/spyre-ubi9:3.4.0-ea.1-1772645510

      Base image:

      quay.io/aipcc/base-images/spyre:3.4.0-ea.1-1772615289

       

       

      Steps to Reproduce:

          1. Run container with any supported decoder model and container fails with "compiler disabled" error message
      
          

      Actual results:

      (EngineCore_DP0 pid=225) ERROR 03-04 16:33:07 [core.py:936] RuntimeError: Compilation disabled
      ...
      ...
      ...
          executor_class, log_stats) as (
      (APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      (APIServer pid=1)   File "/usr/lib64/python3.12/contextlib.py", line 144, in __exit__
      (APIServer pid=1)     next(self.gen)
      (APIServer pid=1)   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/utils.py", line 921, in launch_core_engines
      (APIServer pid=1)     wait_for_engine_startup(
      (APIServer pid=1)   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/utils.py", line 980, in wait_for_engine_startup
      (APIServer pid=1)     raise RuntimeError(
      (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

      Expected results:

      http server starts as normal    

      Additional info:

      IBM will reproduce new RPMs, and we will have to rebuild the base image once they have been mirrored.

              rh-ee-nzeak Nick Zeak
              lbarto Lance Barto
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: