-
Bug
-
Resolution: Obsolete
-
Critical
-
None
-
rhelai-1.4.4
-
None
-
False
-
-
False
-
-
To Reproduce Steps to reproduce the behavior:
- Run Single phase training - ilab model train -y --data-path <messages jsonl file> --is-padding-free False
- Run ilab model evaluate on random checkpoint generated - Cert Test suite ran command: ilab model evaluate --benchmark mmlu --model /root/.local/share/instructlab/checkpoints/hf_format/samples_8646 --enable-serving-output
- It gives an error - ValueError: Out of range float values are not JSON compliant
Expected behavior
- The test should succeed
Device Info (please complete the following information):
- Hardware Specs: 4xL40s
- OS Version: registry.redhat.io/rhelai1/bootc-nvidia-rhel9:1.4 , Version: 9.20250220.0, RHEL AI 1.4.4
- InstructLab Version: instructlab.version: 0.23.5
- Provide the output of these two commands:
-
- ilab system info to print detailed information about InstructLab version, OS, and hardware –
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 4 CUDA devices: Device 0: NVIDIA L40S, compute capability 8.9, VMM: yes Device 1: NVIDIA L40S, compute capability 8.9, VMM: yes Device 2: NVIDIA L40S, compute capability 8.9, VMM: yes Device 3: NVIDIA L40S, compute capability 8.9, VMM: yes Platform: sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] sys.platform: linux os.name: posix platform.release: 5.14.0-427.55.1.el9_4.x86_64 platform.machine: x86_64 platform.node: localhost.localdomain platform.python_version: 3.11.7 os-release.ID: rhel os-release.VERSION_ID: 9.4 os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow) memory.total: 3023.64 GB memory.available: 3007.63 GB memory.used: 4.18 GB InstructLab: instructlab.version: 0.23.5 instructlab-dolomite.version: 0.2.0 instructlab-eval.version: 0.5.1 instructlab-quantize.version: 0.1.0 instructlab-schema.version: 0.4.2 instructlab-sdg.version: 0.7.3 instructlab-training.version: 0.7.0 Torch: torch.version: 2.5.1 torch.backends.cpu.capability: AVX512 torch.version.cuda: 12.4 torch.version.hip: None torch.cuda.available: True torch.backends.cuda.is_built: True torch.backends.mps.is_built: False torch.backends.mps.is_available: False torch.cuda.bf16: True torch.cuda.current.device: 0 torch.cuda.0.name: NVIDIA L40S torch.cuda.0.free: 43.9 GB torch.cuda.0.total: 44.3 GB torch.cuda.0.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.1.name: NVIDIA L40S torch.cuda.1.free: 43.9 GB torch.cuda.1.total: 44.3 GB torch.cuda.1.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.2.name: NVIDIA L40S torch.cuda.2.free: 43.9 GB torch.cuda.2.total: 44.3 GB torch.cuda.2.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.3.name: NVIDIA L40S torch.cuda.3.free: 43.9 GB torch.cuda.3.total: 44.3 GB torch.cuda.3.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute) llama_cpp_python: llama_cpp_python.version: 0.3.2 llama_cpp_python.supports_gpu_offload: True
- ilab system info to print detailed information about InstructLab version, OS, and hardware –
Error
INFO 05-26 21:06:51 engine.py:267] Added request cmpl-2a9f02652b8d4922a6bd3aff5a4e798b-0. [32mINFO[0m: 127.0.0.1:45666 - "[1mPOST /v1/completions HTTP/1.1[0m" [91m500 Internal Server Error[0m [31mERROR[0m: Exception in ASGI application + Exception Group Traceback (most recent call last): | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_utils.py", line 76, in collapse_excgroups | yield | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 178, in __call__ | async with anyio.create_task_group() as task_group: | File "/opt/app-root/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 767, in __aexit__ | raise BaseExceptionGroup( | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi | result = await app( # type: ignore[func-returns-value] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__ | return await self.app(scope, receive, send) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ | await super().__call__(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/applications.py", line 112, in __call__ | await self.middleware_stack(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__ | raise exc | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__ | await self.app(scope, receive, _send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 177, in __call__ | with recv_stream, send_stream, collapse_excgroups(): | File "/usr/lib64/python3.11/contextlib.py", line 158, in __exit__ | self.gen.throw(typ, value, traceback) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups | raise exc | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 179, in __call__ | response = await self.dispatch_func(request, call_next) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id | response = await call_next(request) | ^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 154, in call_next | raise app_exc | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 141, in coro | await self.app(scope, receive_or_disconnect, send_no_error) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__ | await self.app(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | raise exc | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app | await app(scope, receive, sender) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 715, in __call__ | await self.middleware_stack(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 735, in app | await route.handle(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 288, in handle | await self.app(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 76, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | raise exc | File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app | await app(scope, receive, sender) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 73, in app | response = await f(request) | ^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 301, in app | raw_response = await run_endpoint_function( | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function | return await dependant.call(**values) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 371, in create_completion | return JSONResponse(content=generator.model_dump()) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 181, in __init__ | super().__init__(content, status_code, headers, media_type, background) | File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 44, in __init__ | self.body = self.render(content) | ^^^^^^^^^^^^^^^^^^^^ | File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 184, in render | return json.dumps( | ^^^^^^^^^^^ | File "/usr/lib64/python3.11/json/__init__.py", line 238, in dumps | **kw).encode(obj) | ^^^^^^^^^^^ | File "/usr/lib64/python3.11/json/encoder.py", line 200, in encode | chunks = self.iterencode(o, _one_shot=True) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/usr/lib64/python3.11/json/encoder.py", line 258, in iterencode | return _iterencode(o, 0) | ^^^^^^^^^^^^^^^^^ | ValueError: Out of range float values are not JSON compliant +------------------------------------ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/applications.py", line 112, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__ raise exc File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__ await self.app(scope, receive, _send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 177, in __call__ with recv_stream, send_stream, collapse_excgroups(): File "/usr/lib64/python3.11/contextlib.py", line 158, in __exit__ self.gen.throw(typ, value, traceback) File "/opt/app-root/lib64/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups raise exc File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 179, in __call__ response = await self.dispatch_func(request, call_next) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id response = await call_next(request) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 154, in call_next raise app_exc File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/base.py", line 141, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 715, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 735, in app await route.handle(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 288, in handle await self.app(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 76, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/app-root/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/app-root/lib64/python3.11/site-packages/starlette/routing.py", line 73, in app response = await f(request) ^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 301, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 371, in create_completion return JSONResponse(content=generator.model_dump()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 181, in __init__ super().__init__(content, status_code, headers, media_type, background) File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 44, in __init__ self.body = self.render(content) ^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/starlette/responses.py", line 184, in render return json.dumps( ^^^^^^^^^^^ File "/usr/lib64/python3.11/json/__init__.py", line 238, in dumps **kw).encode(obj) ^^^^^^^^^^^ File "/usr/lib64/python3.11/json/encoder.py", line 200, in encode chunks = self.iterencode(o, _one_shot=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/json/encoder.py", line 258, in iterencode return _iterencode(o, 0) ^^^^^^^^^^^^^^^^^ ValueError: Out of range float values are not JSON compliant
Bug impact
- Certification for partner hardware is pending
Known workaround
- None yet