Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10815

Onboard duckdb into the AIPCC Builder

    • Icon: Story Story
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Package 'duckdb' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.

      Build Failure Summary

      Root Cause Analysis: `duckdb` Build Failure

      Summary

      This is not a build failure of the `duckdb` package itself. The pipeline failed during the requirements parsing stage, before any package building began.

      Root Cause

      The requirements file contains an invalid version specifier:

      duckdb==any
      

      The `packaging` library (PEP 440 / PEP 508) cannot parse `any` as a version string. Version specifiers must be valid PEP 440 versions (e.g., `1.2.3`, `>=0.9.0`). The string `any` is not a valid version, causing the parser to raise an `InvalidRequirement` error.

      The relevant traceback:

      ValueError: Failed to parse /collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt
        line 0 'duckdb==any': Expected semicolon (after name with no version specifier) or end
          duckdb==any
                ^
      

      Fix

      Edit the file `/collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt` and replace:

      duckdb==any
      

      with one of the following valid alternatives:

      • `duckdb` — if any version is acceptable (most likely the intent)
      • `duckdb>=0` — equivalent to "any version", but using a valid specifier
      • `duckdb==<specific version>` — e.g., `duckdb==1.2.2` if a specific version should be pinned

      Key Detail

      Because the failure occurred at line 0 of the requirements file during the `prepare-requirements` step, no packages were resolved or built at all — the entire bootstrap process was blocked by this single malformed requirement entry.

      Packaging Analysis Summary

      Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:

      Executive Summary: duckdb Package Build Analysis

      DuckDB is rated Complex (7/10) for source building. It is a large in-process analytical database engine requiring full C++ compilation of ~1,375 core source files plus 103 Python binding files via CMake and pybind11. The build system uses scikit-build-core with a custom in-tree PEP 517 backend. Critically, there are no blockers for onboarding: the license is MIT (fully Red Hat-compatible), all 29 third-party libraries are vendored with permissive licenses, there are zero runtime Python dependencies, and no GPU-specific build variants exist — a single wheel serves all hardware indexes (CPU/CUDA/ROCm).

      The recommended build approach is source build from the PyPI sdist (

      duckdb-1.4.4.tar.gz

      , 17 MB), which bundles the complete DuckDB core engine and all third-party dependencies. No external system library development packages are needed beyond a C++ compiler. Key build requirements are: GCC 12+ (GCC 11.x has known

      dynamic_cast

      failures per issue #239), CMake >= 3.29, and Ninja >= 1.10. For deterministic versioning, set

      OVERRIDE_GIT_DESCRIBE=v1.4.4

      . The build command is straightforward:

      pip download duckdb==1.4.4 --no-binary :all: --no-deps
      pip wheel duckdb-1.4.4.tar.gz --no-deps
      

      Platform compatibility is confirmed for RHEL 8+ and OpenShift AI environments, as the upstream wheels target

      manylinux_2_28

      (glibc >= 2.28), which is guaranteed on these platforms. The PyPI

      duckdb

      package is built from the duckdb/duckdb-python repository (not the main duckdb/duckdb repo); the core engine is included as a git submodule. Build performance can be improved by enabling ccache, which the CMake configuration detects automatically. Unity builds are enabled by default, further reducing compilation time.

      Key actionable items: (1) Ensure GCC 12+ is available via gcc-toolset in the build environment. (2) Use the sdist for reproducible builds — no git submodule handling required. (3) No special hardware-index wheel variants are needed. (4) For containerized builds, the upstream-validated

      quay.io/pypa/manylinux_2_28_x86_64

      image provides a known-good build environment. (5) Pre-built PyPI wheels (~19 MB, Python 3.9–3.14) serve as a reliable fallback if source build issues arise.

              Unassigned Unassigned
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: