-
Bug
-
Resolution: Done
-
Critical
-
None
-
RHAIIS-3.3
-
None
-
False
-
-
False
-
-
-
AIPCC Accelerators 24
Steps to reproduce the behavior:
podman login quay.io
podman pull quay.io/aipcc/base-images/rocm-6.4-el9.6
3. Create podman container
podman run -dit \
--name aipccrocm \
--device=/dev/kfd \
--device=/dev/dri \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--group-add video \
-v /home:/home \
--user root \
quay.io/aipcc/base-images/rocm-6.4-el9.6
4. Create Requirement file from RHAIIS rocm collections. → https://gitlab.com/redhat/rhel-ai/rhaiis/pipeline/-/blob/main/collections/rhaiis/rocm-ubi9/requirements.txt?ref_type=heads
5. Pip install RHAIIS rocm wheels
NETRC=.netrc pip install --only-binary :all: --index-url https://gitlab.com/api/v4/projects/75894209/packages/pypi/simple/ --trusted-host gitlab.com -r req.txt
Logs:
root@enc1-gpuvm021:/home/hotaisle# podman images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/aipcc/base-images/rocm-6.4-el9.6 latest 20343085fb95 42 hours ago 16.2 GB
(.venv) (app-root) /opt/app-root/wheels-test$ ROCM_HOME=/opt/rocm ROCM_PATH=/opt/rocm python -c " import os print('ROCM_HOME:', os.environ.get('ROCM_HOME')) print('ROCM_PATH:', os.environ.get('ROCM_PATH')) from vllm.platforms import current_platform print('Platform:', current_platform) print('Device type:', repr(current_platform.device_type)) # If platform detected, try LLM if current_platform.device_type: from vllm import LLM llm = LLM('Qwen/Qwen3-0.6B', max_model_len=256) print('vLLM works!') else: print('Platform still not detected') " ROCM_HOME: /opt/rocm ROCM_PATH: /opt/rocm Platform: <vllm.platforms.interface.UnspecifiedPlatform object at 0x7f852f754f80> Device type: '' Platform still not detected (.venv) (app-root) /opt/app-root/wheels-test$
Probe-Test failure:
def test_stability_loop(self, current_accelerator, current_architecture):
"""
Test vllm stability with multiple inference cycles.
This test runs multiple inference iterations to detect memory leaks
or stability issues.
"""
logger.info("")
logger.info("=" * 70)
logger.info("STABILITY LOOP TEST")
logger.info("=" * 70)
logger.info("Accelerator: %s", current_accelerator)
logger.info("Architecture: %s", current_architecture)
# Skip on CPU
if current_accelerator == AcceleratorType.CPU:
logger.info("Skipping: vLLM stability test requires GPU or TPU")
pytest.skip("stability test requires GPU or TPU")
if "vllm" not in registry.tests:
logger.warning("vLLM tests not registered - skipping")
pytest.skip("vllm tests not registered")
tests = registry.tests["vllm"].get(Category.STABILITY, [])
if not tests:
logger.warning("No stability tests available - skipping")
pytest.skip("No stability tests available")
logger.info("Running %d stability test(s)...", len(tests))
results = []
for func_name, test_func in tests:
logger.info(" Testing: %s", func_name)
result = test_func(current_accelerator, current_architecture)
results.append(result)
if result.success:
logger.info(" ✓ PASSED (%.3fs)", result.execution_time)
else:
logger.error(" ✗ FAILED: %s", result.error_message)
# At least one stability test should pass
success_count = sum(1 for r in results if r.success)
logger.info("")
logger.info("Test Results: %d/%d passed", success_count, len(results))
logger.info("=" * 70)
error_msg = results[0].error_message if results else "no tests run"
> assert success_count > 0, f"All stability tests failed: {error_msg}"
E AssertionError: All stability tests failed: Device string must not be empty
E assert 0 > 0
probe-tests/probe-vllm/vllm_probe_test.py:1265: AssertionError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
2026-01-16 12:41:11,796 | vllm_probe_test | INFO |
2026-01-16 12:41:11,796 - vllm_probe_test - INFO -
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | ======================================================================
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - ======================================================================
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | STABILITY LOOP TEST
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - STABILITY LOOP TEST
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | ======================================================================
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - ======================================================================
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | Accelerator: AcceleratorType.ROCM
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - Accelerator: AcceleratorType.ROCM
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | Architecture: ArchitectureType.X86
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - Architecture: ArchitectureType.X86
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | Running 1 stability test(s)...
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - Running 1 stability test(s)...
2026-01-16 12:41:11,796 | vllm_probe_test | INFO | Testing: stability_loop
2026-01-16 12:41:11,796 - vllm_probe_test - INFO - Testing: stability_loop
2026-01-16 12:41:11,797 - vllm.entrypoints.utils - INFO - non-default args: {'max_model_len': 256, 'gpu_memory_utilization': 0.5, 'disable_log_stats': True, 'enforce_eager': True}
2026-01-16 12:41:12,204 | vllm_probe_test | ERROR | ✗ FAILED: Device string must not be empty
2026-01-16 12:41:12,204 - vllm_probe_test - ERROR - ✗ FAILED: Device string must not be empty
2026-01-16 12:41:12,204 | vllm_probe_test | INFO |
2026-01-16 12:41:12,204 - vllm_probe_test - INFO -
2026-01-16 12:41:12,204 | vllm_probe_test | INFO | Test Results: 0/1 passed
2026-01-16 12:41:12,204 - vllm_probe_test - INFO - Test Results: 0/1 passed
2026-01-16 12:41:12,204 | vllm_probe_test | INFO | ======================================================================
2026-01-16 12:41:12,204 - vllm_probe_test - INFO - ======================================================================
-------------------------------------------------------------------- Captured log call --------------------------------------------------------------------
INFO vllm_probe_test:vllm_probe_test.py:1223
INFO vllm_probe_test:vllm_probe_test.py:1224 ======================================================================
INFO vllm_probe_test:vllm_probe_test.py:1225 STABILITY LOOP TEST
INFO vllm_probe_test:vllm_probe_test.py:1226 ======================================================================
INFO vllm_probe_test:vllm_probe_test.py:1227 Accelerator: AcceleratorType.ROCM
INFO vllm_probe_test:vllm_probe_test.py:1228 Architecture: ArchitectureType.X86
INFO vllm_probe_test:vllm_probe_test.py:1244 Running 1 stability test(s)...
INFO vllm_probe_test:vllm_probe_test.py:1248 Testing: stability_loop
INFO vllm.entrypoints.utils:utils.py:253 non-default args: {'max_model_len': 256, 'gpu_memory_utilization': 0.5, 'disable_log_stats': True, 'enforce_eager': True}
ERROR vllm_probe_test:vllm_probe_test.py:1255 ✗ FAILED: Device string must not be empty
INFO vllm_probe_test:vllm_probe_test.py:1260
INFO vllm_probe_test:vllm_probe_test.py:1261 Test Results: 0/1 passed
INFO vllm_probe_test:vllm_probe_test.py:1262 ======================================================================
==================================================================== warnings summary =====================================================================
probe-tests/probe-vllm/vllm_probe_test.py::TestVLLMCore::test_import_and_abi_verification
<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
probe-tests/probe-vllm/vllm_probe_test.py::TestVLLMCore::test_import_and_abi_verification
<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================= short test summary info =================================================================
FAILED probe-tests/probe-vllm/vllm_probe_test.py::TestVLLMCore::test_basic_inference - AssertionError: All inference tests failed
FAILED probe-tests/probe-vllm/vllm_probe_test.py::TestVLLMStability::test_stability_loop - AssertionError: All stability tests failed: Device string must not be empty
=================================================== 2 failed, 4 passed, 3 skipped, 2 warnings in 11.30s ===================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
(.venv) (app-root) /opt/app-root/wheels-test$
Expected Result:
vllm Probes tests should pass without error.
Actual Result:
Observing errors for probes tests ran for vllm which used to pass in older collections/releases.