Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3729

Granite model max_seq_len error on model serve

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      To Reproduce Steps to reproduce the behavior:

      Using 1xL4 GPU setup, changed all GPU values in the config file to 1. 

      IBM Cloud Instance

      Expected behavior

      • Error while running ilab model serve

      Screenshots

      Device Info (please complete the following information):

      • Hardware Specs: Apple M3 Pro 36GB Memory
      • OS Version: Mac 0S 15.3.1  
      • InstructLab Version: 
        ilab, version 0.23.3
      • registry.stage.redhat.io/rhelai1/bootc-nvidia-rhel9:1.4.3-1741712137
      •  
        ilab system info
        • ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no

      ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no

      ggml_cuda_init: found 1 CUDA devices:

        Device 0: NVIDIA L4, compute capability 8.9, VMM: yes

      Platform:

        sys.version: 3.11.7 (main, Jan  8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]

        sys.platform: linux

        os.name: posix

        platform.release: 5.14.0-427.55.1.el9_4.x86_64

        platform.machine: x86_64

        platform.node: ecosystem-qe-us-east3-l4

        platform.python_version: 3.11.7

        os-release.ID: rhel

        os-release.VERSION_ID: 9.4

        os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)

        memory.total: 78.54 GB

        memory.available: 77.18 GB

        memory.used: 0.58 GB

       

      InstructLab:

        instructlab.version: 0.23.3

        instructlab-dolomite.version: 0.2.0

        instructlab-eval.version: 0.5.1

        instructlab-quantize.version: 0.1.0

        instructlab-schema.version: 0.4.2

        instructlab-sdg.version: 0.7.1

        instructlab-training.version: 0.7.0

       

      Torch:

        torch.version: 2.5.1

        torch.backends.cpu.capability: AVX512

        torch.version.cuda: 12.4

        torch.version.hip: None

        torch.cuda.available: True

        torch.backends.cuda.is_built: True

        torch.backends.mps.is_built: False

        torch.backends.mps.is_available: False

        torch.cuda.bf16: True

        torch.cuda.current.device: 0

        torch.cuda.0.name: NVIDIA L4

        torch.cuda.0.free: 21.8 GB

        torch.cuda.0.total: 22.0 GB

        torch.cuda.0.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)

       

      llama_cpp_python:

        llama_cpp_python.version: 0.3.2

        llama_cpp_python.supports_gpu_offload: True

      Additional context

      • config.yaml
        •  

              cdoern@redhat.com Charles Doern
              rh-ee-jlarkin Justin Larkin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: