Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10814

Onboard rapidfuzz into the AIPCC Builder

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Package 'rapidfuzz' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.

      Build Failure Summary

      Root Cause Analysis: `rapidfuzz` Build Failure

      Summary

      The build never reached the compilation stage for `rapidfuzz`. It failed during the requirements preparation phase due to an invalid version specifier in the requirements file.

      Root Cause

      The file `/collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt` contains the line:

      rapidfuzz==any
      

      The version string `any` is not a valid PEP 440 version specifier. The `packaging` library's requirement parser rejects it:

      packaging._tokenizer.ParserSyntaxError: Expected semicolon (after name with no version specifier) or end
          rapidfuzz==any
                   ^
      

      The `prepare-requirements` tool calls `Requirement("rapidfuzz==any")`, which raises an `InvalidRequirement` exception, causing the entire pipeline to abort at line 0 of the requirements file.

      Fix

      Replace `rapidfuzz==any` in the requirements file with a valid specifier:

      • To allow any version (unpinned): simply use `rapidfuzz` with no version operator
      • To pin to a specific version: use a valid PEP 440 version, e.g. `rapidfuzz==3.12.2`
      • To set a range: use standard operators, e.g. `rapidfuzz>=3.0.0,<4.0.0`

      For example, edit `/collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt` and change:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      * rapidfuzz==any
      + rapidfuzz
      

      or pin to a concrete version if reproducibility is required:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      * rapidfuzz==any
      + rapidfuzz==3.12.2
      

      Key Details

      • Failure location: `prepare-requirements` step, before any wheel building begins
      • Failing tool: `package_plugins.cli.prepare_requirements_constraints:parse_requirements_file` (line 44-46)
      • Input file: `/collection-repository/collections/torch-2.9.0/cpu-ubi9/requirements.txt`, line 0
      • The `==any` syntax appears to be a placeholder or convention from another system that is not compatible with PEP 440 / Python `packaging` library parsing

      Packaging Analysis Summary

      Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:

      Executive Summary: RapidFuzz Packaging Analysis

      RapidFuzz (v3.14.3) is a high-performance fuzzy string matching library with ~122M monthly PyPI downloads, licensed under MIT with no redistribution concerns. The package requires a source build with C++ extensions to deliver production-grade performance — without them, the pure Python fallback degrades throughput by 10-100x. The build system uses scikit-build-core with a CMake backend and requires a C++17 compiler, CMake >= 3.15, and Python development headers. Critically, the sdist on PyPI includes pre-generated Cython .cxx files, which eliminates the Cython build dependency and simplifies the build pipeline.

      There are no blocking dependencies or known build issues for Linux x86_64. All previously reported build failures (PEP 517 sdist builds, missing CMake dependency detection, libc+-19 compatibility) have been resolved in v3.14.3. The package has zero mandatory runtime dependencies — numpy is optional and only needed for matrix operations. Build requirements are minimal: gcc-c, cmake, python3-devel, and scikit-build-core. The vendored C+ libraries (rapidfuzz-cpp, taskflow) are both MIT-licensed and header-only, with no external system library dependencies beyond standard C++ runtime and libatomic.

      The single most important configuration for packaging is setting the environment variable RAPIDFUZZ_BUILD_EXTENSION=true. Without this, a failed C++ compilation will silently produce a pure Python wheel with severely degraded performance. With this flag, the build fails loudly on compilation errors, which is the desired behavior for controlled packaging environments. Since RapidFuzz performs pure CPU string matching with no GPU code, a single wheel serves all three index targets (CPU, CUDA, ROCm) identically — no index-specific builds are required.

      Key Build Command

      dnf install gcc-c++ cmake python3-devel
      export RAPIDFUZZ_BUILD_EXTENSION=true
      pip wheel --no-binary :all: rapidfuzz==3.14.3
      

      Validation Checks

      • import rapidfuzz succeeds
      • rapidfuzz.fuzz.ratio("test", "test") returns 100.0
      • import rapidfuzz.fuzz_cpp does not raise ImportError (confirms C++ extensions are present)

      Key Findings

      • License: MIT — fully compliant for redistribution, including vendored dependencies
      • Blockers: None identified for Linux x86_64
      • Risk: Low — mature build system, active maintenance, comprehensive CI across Python 3.10–3.14
      • Single wheel per architecture: No GPU dependencies, one build covers CPU/CUDA/ROCm indexes

              epacific@redhat.com Einat Pacifici
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: