Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8057

Unsatisfied library dependencies on libnvidia-ml.so in the cuda aipcc image

    • False
    • Hide

      None

      Show
      None
    • False

      ❯ podman run --user 0 --rm -it registry.redhat.io/rhai/base-image-cuda-rhel9:3.2.0-1765367347
      Trying to pull registry.redhat.io/rhai/base-image-cuda-rhel9:3.2.0-1765367347...
      Getting image source signatures
      Checking if image destination supports signatures
      Copying blob sha256:11f81ffe9d0743a9f909d61688991d71cff5592d2111bde11e472d4393ba9860
      Copying blob sha256:27e7cfc996422af11fa0e0744f1b18263b72fc31cfa467cbcb9486174a364503
      ldd /lib64/ucx/libuct_cuda.so.0.0.0
      Copying config sha256:ba1a31a3777d93ae6845bd6d6e163c1e350a79f6d663e7f2862dd6d84bfa4e6b
      Writing manifest to image destination
      Storing signatures
      ldd /lib64/ucx/libuct_cuda.so.0.0.0
      (app-root) /opt/app-root$ ldd /lib64/ucx/libuct_cuda.so.0.0.0
      	linux-vdso.so.1 (0x0000ffff914dc000)
      	/usr/lib64/libjemalloc.so.2 (0x0000ffff9135b000)
      	libuct.so.0 => /lib64/libuct.so.0 (0x0000ffff91307000)
      	libucs.so.0 => /lib64/libucs.so.0 (0x0000ffff9128f000)
      	libm.so.6 => /lib64/libm.so.6 (0x0000ffff911ee000)
      	libucm.so.0 => /lib64/libucm.so.0 (0x0000ffff911bd000)
      	libcuda.so.1 => /usr/local/cuda/compat/libcuda.so.1 (0x0000ffff8b91a000)
      	libnvidia-ml.so.1 => not found
      	libc.so.6 => /lib64/libc.so.6 (0x0000ffff8b76c000)
      	/lib/ld-linux-aarch64.so.1 (0x0000ffff914a0000)
      	libdl.so.2 => /lib64/libdl.so.2 (0x0000ffff8b74b000)
      	librt.so.1 => /lib64/librt.so.1 (0x0000ffff8b72a000)
      	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000ffff8b709000)
      (app-root) /opt/app-root$ find / -name "libnvidia-ml.so*"
      (app-root) /opt/app-root$ find / -name "libnvidia-ml*"
      (app-root) /opt/app-root$ 
      
      

      Originally found at https://github.com/red-hat-data-services/notebooks/actions/runs/20240395610/job/58107195991?pr=1821 using pytest tests

      SUBFAILED[dlib='/lib64/ucx/libuct_cuda.so.0.0.0'] tests/containers/base_image_test.py::TestBaseImage::test_elf_files_can_link_runtime_libs[ghcr.io/red-hat-data-services/notebooks/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.12-_2c70a7ff3733116f48d692ab9dd8ac657ace03bd] - Failed: dlib='/lib64/ucx/libuct_cuda.so.0.0.0' has unsatisfied dependencies deps='libnvidia-ml.so.1 => not found'
      
      SUBFAILED[dlib='/lib64/ucx/libuct_cuda_gdrcopy.so.0.0.0'] tests/containers/base_image_test.py::TestBaseImage::test_elf_files_can_link_runtime_libs[ghcr.io/red-hat-data-services/notebooks/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.12-_2c70a7ff3733116f48d692ab9dd8ac657ace03bd] - Failed: dlib='/lib64/ucx/libuct_cuda_gdrcopy.so.0.0.0' has unsatisfied dependencies deps='libnvidia-ml.so.1 => not found'
      
      SUBFAILED[dlib='/lib64/ucx/libuct_cuda_gdrcopy.so.0.0.0'] tests/containers/base_image_test.py::TestBaseImage::test_elf_files_can_link_runtime_libs[ghcr.io/red-hat-data-services/notebooks/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.12-_2c70a7ff3733116f48d692ab9dd8ac657ace03bd] - Failed: dlib='/lib64/ucx/libuct_cuda_gdrcopy.so.0.0.0' has unsatisfied dependencies deps='libnvidia-ml.so.1 => not found'
      

      Initial triage at slack https://redhat-internal.slack.com/archives/C07JX0EMKCZ/p1760698434172489, so far it has not been found to impact functionality of the workbench images, but I'm still testing

              Unassigned Unassigned
              jdanek@redhat.com Jiri Daněk
              Christian Heimes, Emilien Macchi
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: