Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-7747

Segmentation fault is observed for numba probe tests on cuda 12.9 aarch64 for torch2.9 collections

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Accelerator Enablement
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • AIPCC Accelerators 21, AIPCC Accelerators 22, AIPCC Accelerators 23
    • Important

      Note: Same probes tests passes on cuda 12.9 x86 setup but core dumps on aarch64

      Steps to reproduce the behavior:

      1. docker login quay.io
      2. docker pull quay.io/aipcc/base-images/cuda-12.9-el9.6:latest
      3. Create docker container

      
      docker run -dit --name aipcccuda_koushik_DND   --device /dev/nvidia0   --device /dev/nvidiactl   --device /dev/nvidia-uvm   --device /dev/nvidia-uvm-tools   --security-opt label=disable   -v /usr/lib64/libnvidia-ml.so.1:/usr/lib64/libnvidia-ml.so.1:ro   -v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi:ro   --env NVIDIA_VISIBLE_DEVICES=all   --user root   quay.io/aipcc/base-images/cuda-12.9-el9.6:latest
      

      4. Create Requirement file from torch 2.9 cuda 12.9 collections. → https://gitlab.com/redhat/rhel-ai/wheels/builder/-/blob/main/collections/torch-2.9.0/cuda12.9-ubi9/requirements.txt
      5. Pip install torch 2.9 cuda 12.9 aarch64 wheels

      
      NETRC=.netrc pip install --only-binary :all:   --index-url https://gitlab.com/api/v4/projects/75552518/packages/pypi/simple/  --trusted-host gitlab.com   -r req.txt 
      

      5. Run Probe tests for numba.
      6. Segmentation fault is observed for numba probe tests

      Expected Result:
      Numba Probes tests should pass without error.

      Actual Result:
      Segmentation fault is observed for numba probe tests

      Logs:
      1. docker Images:

      r

      [root@ip-172-31-81-216 ec2-user]# docker ps
      CONTAINER ID   IMAGE                                              COMMAND       CREATED        STATUS        PORTS     NAMES
      7ba479c6a838   quay.io/aipcc/base-images/cuda-12.9-el9.6:latest   "/bin/bash"   30 hours ago   Up 29 hours             aipcccuda_koushik_DND
      [root@ip-172-31-81-216 ec2-user]
      

      2. PIP list show
      (

      .(.venv) (app-root) /opt/app-root$ python
      Python 3.12.9 (main, Aug 14 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import numba
      >>> 
      (.venv) (app-root) /opt/app-root$ pip list show | grep "numba"
      numba                              0.61.2
      (.venv) (app-root) /opt/app-root$
      
      

      3. H/W details:

      (.venv) (app-root) /opt/app-root$ cat /etc/os-release
      NAME="Red Hat Enterprise Linux"
      VERSION="9.6 (Plow)"
      ID="rhel"
      ID_LIKE="fedora"
      VERSION_ID="9.6"
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="Red Hat Enterprise Linux 9.6 (Plow)"
      ANSI_COLOR="0;31"
      LOGO="fedora-logo-icon"
      CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
      HOME_URL="https://www.redhat.com/"
      DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
      BUG_REPORT_URL="https://issues.redhat.com/"
      
      REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
      REDHAT_BUGZILLA_PRODUCT_VERSION=9.6
      REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
      REDHAT_SUPPORT_PRODUCT_VERSION="9.6"
      (.venv) (app-root) /opt/app-root$
      

      4. Errors:
      (

      
      (.venv) (app-root) /opt/app-root/wheels-test$ pytest probe-tests/ -m "numba"
      ======================================================= test session starts =======================================================
      platform linux -- Python 3.12.9, pytest-9.0.1, pluggy-1.6.0
      rootdir: /opt/app-root/wheels-test
      configfile: pyproject.toml
      plugins: anyio-4.12.0
      collected 689 items / 652 deselected / 37 selected                                                                                
      
      ================================================================================
      NUMBA PROBE TEST SESSION START
      ================================================================================
      Environment: NVIDIA CUDA GPU
      Architecture: ARM64/AArch64 Architecture
      Output directory: /opt/app-root/wheels-test
      Log file: /opt/app-root/wheels-test/numba_probe_test_cuda_aarch64_20251203_172739.log
      ================================================================================
      
      2025-12-03 17:27:39,023 - numba - INFO - ================================================================================
      2025-12-03 17:27:39,024 - numba - INFO - NUMBA PROBE TEST SESSION START
      2025-12-03 17:27:39,024 - numba - INFO - ================================================================================
      2025-12-03 17:27:39,024 - numba - INFO - Environment: ('cuda', '', 'NVIDIA CUDA GPU') on ('aarch64', '', 'ARM64/AArch64 Architecture')
      2025-12-03 17:27:39,024 - numba - INFO - Python: 3.12.9
      2025-12-03 17:27:39,024 - numba - INFO - Platform: Linux-6.1.150-174.273.amzn2023.aarch64-aarch64-with-glibc2.34
      2025-12-03 17:27:39,024 - numba - INFO - Output directory: /opt/app-root/wheels-test
      2025-12-03 17:27:39,024 - numba - INFO - Log file: /opt/app-root/wheels-test/numba_probe_test_cuda_aarch64_20251203_172739.log
      2025-12-03 17:27:39,024 - numba - INFO - ================================================================================
      
      probe-tests/probe-numba/numba_probe_test.py ...........................Fatal Python error: Segmentation fault
      
      Current thread 0x0000ffff8246c020 (most recent call first):
        File "/opt/app-root/wheels-test/probe-tests/probe-numba/numba_probe_test.py", line 1139 in test_prange_basic
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/python.py", line 1720 in runtest
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 245 in <lambda>
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 353 in from_call
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 244 in call_and_report
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 137 in runtestprotocol
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 372 in _main
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 318 in wrap_session
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
        File "/opt/app-root/.venv/lib64/python3.12/site-packages/_pytest/config/__init__.py", line 197 in main
        File Segmentation fault (core dumped)
      
      

      5.Torch versions :

      (.venv) (app-root) /opt/app-root$ pip list show | grep "torch"
      torch                              2.9.0
      torchao                            0.14.1+git
      torchaudio                         2.9.0
      torchvision                        0.24.0
      (.venv) (app-root) /opt/app-root$
      

              mprpic@redhat.com Martin Prpic
              rh-ee-konagara Koushik Nagaraj
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: