Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10812

Onboard gpustat into the AIPCC Builder

    • Icon: Story Story
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Package 'gpustat' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.

      Build Failure Summary

      Root Cause Analysis: `gpustat` Build Failure

      Summary

      This is not a build failure of the `gpustat` package itself. The pipeline failed during the requirements preparation phase, before any package building began.

      Root Cause

      The requirements file `/collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt` contains an invalid PEP 440 version specifier:

      gpustat==any
      

      The `packaging` library's requirement parser rejects this because `any` is not a valid version string per [PEP 440](https://peps.python.org/pep-0440/). The `==` operator expects a concrete version number (e.g., `==1.1.1`), not the literal word `any`.

      The relevant error from the log:

      packaging.requirements.InvalidRequirement: Expected semicolon (after name with no version specifier) or end
          gpustat==any
                 ^
      
      ValueError: Failed to parse /collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt
        line 0 'gpustat==any'
      

      Fix

      Edit the requirements file and replace the invalid specifier with a valid one:

      • If any version of `gpustat` is acceptable: remove the version specifier entirely:
        ```
        gpustat
        ```
      • If a specific version is needed: use a valid PEP 440 version, e.g.:
        ```
        gpustat==1.1.1
        ```
      • If a minimum version is needed: use a range specifier, e.g.:
        ```
        gpustat>=1.0
        ```

      Impact

      Because the failure occurred in `prepare-requirements` (the very first stage), no packages were resolved or built at all — the `build-sequence-summary.md`, `computed-requirements.txt`, and all build logs are missing from the artifacts, which confirms the pipeline never progressed past requirement parsing.

      Packaging Analysis Summary

      Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:

      Executive Summary: gpustat Packaging Analysis

      gpustat is a pure-Python CLI utility for monitoring NVIDIA GPUs, rated Simple (1/10) for build complexity. The package produces a universal py3-none-any wheel with no native compilation required — building is as straightforward as running pip wheel --no-deps gpustat==1.1.1. The license is MIT, fully compatible with Red Hat redistribution policies, and all transitive dependency licenses (BSD, BSD-3-Clause, MIT) are equally clear. There are no build blockers, no build warnings, and no custom toolchain requirements. The recommended source is the PyPI sdist for v1.1.1, the latest stable release.

      The primary consideration for onboarding is runtime dependency and hardware scope management, not build complexity. gpustat depends on nvidia-ml-py (pure-Python NVML bindings, BSD), psutil (C extensions, but pre-built manylinux wheels available, BSD-3-Clause), and blessed (pure Python, MIT). At runtime, the host must provide libnvidia-ml.so via the NVIDIA driver (R450+). On CUDA indexes, gpustat is fully functional. On CPU and ROCm indexes, the package installs without error but provides no functionality — it exits with code 1 and reports no devices. AMD GPU support does not exist (an unmerged PR is open but inactive).

      The only notable runtime issue is a driver-specific bug affecting NVIDIA driver series 535.43–535.98, where gpustat reports only the first process per GPU (GitHub #161). The fix is to either upgrade the driver to >=535.98 or pin nvidia-ml-py>=12.535.108 (which the upstream master branch already does but has not yet released). No other issues affect build or packaging. If gpustat proves unsuitable for multi-hardware deployments, nvitop (MIT licensed, more feature-rich) is a viable alternative.

      Key recommendations:

      • Build from PyPI sdist v1.1.1 using a standard source build — no special environment variables or tooling needed
      • For driver 535 environments, patch the nvidia-ml-py pin to >=12.535.108
      • Accept that gpustat is NVIDIA-only; it will be a no-op on CPU and ROCm hardware indexes
      • No container-specific build requirements — libnvidia-ml.so is accessed at runtime from the host via the GPU operator

              Unassigned Unassigned
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: