Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10822

Add duckdb into the RHAI pipeline onboarding collection

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Add package 'duckdb' into the RHAI pipeline onboarding collection.

      The package requires builder repository onboarding before it can be added to the RHAI pipeline. This ticket is blocked by the builder onboarding ticket.

      Summary

      Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:

      Executive Summary: duckdb Package Build Analysis

      DuckDB is rated Complex (7/10) for source building. It is a large in-process analytical database engine requiring full C++ compilation of ~1,375 core source files plus 103 Python binding files via CMake and pybind11. The build system uses scikit-build-core with a custom in-tree PEP 517 backend. Critically, there are no blockers for onboarding: the license is MIT (fully Red Hat-compatible), all 29 third-party libraries are vendored with permissive licenses, there are zero runtime Python dependencies, and no GPU-specific build variants exist — a single wheel serves all hardware indexes (CPU/CUDA/ROCm).

      The recommended build approach is source build from the PyPI sdist (

      duckdb-1.4.4.tar.gz

      , 17 MB), which bundles the complete DuckDB core engine and all third-party dependencies. No external system library development packages are needed beyond a C++ compiler. Key build requirements are: GCC 12+ (GCC 11.x has known

      dynamic_cast

      failures per issue #239), CMake >= 3.29, and Ninja >= 1.10. For deterministic versioning, set

      OVERRIDE_GIT_DESCRIBE=v1.4.4

      . The build command is straightforward:

      pip download duckdb==1.4.4 --no-binary :all: --no-deps
      pip wheel duckdb-1.4.4.tar.gz --no-deps
      

      Platform compatibility is confirmed for RHEL 8+ and OpenShift AI environments, as the upstream wheels target

      manylinux_2_28

      (glibc >= 2.28), which is guaranteed on these platforms. The PyPI

      duckdb

      package is built from the duckdb/duckdb-python repository (not the main duckdb/duckdb repo); the core engine is included as a git submodule. Build performance can be improved by enabling ccache, which the CMake configuration detects automatically. Unity builds are enabled by default, further reducing compilation time.

      Key actionable items: (1) Ensure GCC 12+ is available via gcc-toolset in the build environment. (2) Use the sdist for reproducible builds — no git submodule handling required. (3) No special hardware-index wheel variants are needed. (4) For containerized builds, the upstream-validated

      quay.io/pypa/manylinux_2_28_x86_64

      image provides a known-good build environment. (5) Pre-built PyPI wheels (~19 MB, Python 3.9–3.14) serve as a reliable fallback if source build issues arise.

              Unassigned Unassigned
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: