-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
-
False
-
-
-
AIPCC Accelerators 21, AIPCC Accelerators 22, AIPCC Accelerators 23
-
Important
Note: Same probes tests passes on cuda 12.9 x86 setup but core dumps on aarch64
Steps to reproduce the behavior:
1. docker login quay.io
2. docker pull quay.io/aipcc/base-images/cuda-12.9-el9.6:latest
3. Create docker container
docker run -dit --name aipcccuda_koushik_DND --device /dev/nvidia0 --device /dev/nvidiactl --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --security-opt label=disable -v /usr/lib64/libnvidia-ml.so.1:/usr/lib64/libnvidia-ml.so.1:ro -v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi:ro --env NVIDIA_VISIBLE_DEVICES=all --user root quay.io/aipcc/base-images/cuda-12.9-el9.6:latest
4. Create Requirement file from torch 2.9 cuda 12.9 collections. → https://gitlab.com/redhat/rhel-ai/wheels/builder/-/blob/main/collections/torch-2.9.0/cuda12.9-ubi9/requirements.txt
5. Pip install torch 2.9 cuda 12.9 aarch64 wheels
NETRC=.netrc pip install --only-binary :all: --index-url https://gitlab.com/api/v4/projects/75552518/packages/pypi/simple/ --trusted-host gitlab.com -r req.txt
5. Run Probe tests for numba.
6. Segmentation fault is observed for numba probe tests
Expected Result:
Numba Probes tests should pass without error.
Actual Result:
Segmentation fault is observed for numba probe tests
Logs:
1. docker Images:
r
[root@ip-172-31-81-216 ec2-user]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7ba479c6a838 quay.io/aipcc/base-images/cuda-12.9-el9.6:latest "/bin/bash" 30 hours ago Up 29 hours aipcccuda_koushik_DND
[root@ip-172-31-81-216 ec2-user]
2. PIP list show
(
.(.venv) (app-root) /opt/app-root$ python Python 3.12.9 (main, Aug 14 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numba >>> (.venv) (app-root) /opt/app-root$ pip list show | grep "numba" numba 0.61.2 (.venv) (app-root) /opt/app-root$
3. H/W details:
(.venv) (app-root) /opt/app-root$ cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="9.6 (Plow)" ID="rhel" ID_LIKE="fedora" VERSION_ID="9.6" PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux 9.6 (Plow)" ANSI_COLOR="0;31" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9" BUG_REPORT_URL="https://issues.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.6 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.6" (.venv) (app-root) /opt/app-root$
4. Errors:
(
(.venv) (app-root) /opt/app-root/wheels-test$ pytest probe-tests/ -m "numba" ======================================================= test session starts ======================================================= platform linux -- Python 3.12.9, pytest-9.0.1, pluggy-1.6.0 rootdir: /opt/app-root/wheels-test configfile: pyproject.toml plugins: anyio-4.12.0 collected 689 items / 652 deselected / 37 selected ================================================================================ NUMBA PROBE TEST SESSION START ================================================================================ Environment: NVIDIA CUDA GPU Architecture: ARM64/AArch64 Architecture Output directory: /opt/app-root/wheels-test Log file: /opt/app-root/wheels-test/numba_probe_test_cuda_aarch64_20251203_172739.log ================================================================================ 2025-12-03 17:27:39,023 - numba - INFO - ================================================================================ 2025-12-03 17:27:39,024 - numba - INFO - NUMBA PROBE TEST SESSION START 2025-12-03 17:27:39,024 - numba - INFO - ================================================================================ 2025-12-03 17:27:39,024 - numba - INFO - Environment: ('cuda', '', 'NVIDIA CUDA GPU') on ('aarch64', '', 'ARM64/AArch64 Architecture') 2025-12-03 17:27:39,024 - numba - INFO - Python: 3.12.9 2025-12-03 17:27:39,024 - numba - INFO - Platform: Linux-6.1.150-174.273.amzn2023.aarch64-aarch64-with-glibc2.34 2025-12-03 17:27:39,024 - numba - INFO - Output directory: /opt/app-root/wheels-test 2025-12-03 17:27:39,024 - numba - INFO - Log file: /opt/app-root/wheels-test/numba_probe_test_cuda_aarch64_20251203_172739.log 2025-12-03 17:27:39,024 - numba - INFO - ================================================================================ probe-tests/probe-numba/numba_probe_test.py ...........................Fatal Python error: Segmentation fault Current thread 0x0000ffff8246c020 (most recent call first): File "/opt/app-root/wheels-test/probe-tests/probe-numba/numba_probe_test.py", line 1139 in test_prange_basic File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__ File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/python.py", line 1720 in runtest File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__ File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 245 in <lambda> File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 353 in from_call File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 244 in call_and_report File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 137 in runtestprotocol File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__ File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 396 in pytest_runtestloop File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__ File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 372 in _main File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 318 in wrap_session File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__ File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/config/__init__.py", line 197 in main File Segmentation fault (core dumped)
5.Torch versions :
(.venv) (app-root) /opt/app-root$ pip list show | grep "torch"
torch 2.9.0
torchao 0.14.1+git
torchaudio 2.9.0
torchvision 0.24.0
(.venv) (app-root) /opt/app-root$