Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10821

Add rapidfuzz into the RHAI pipeline onboarding collection

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Add package 'rapidfuzz' into the RHAI pipeline onboarding collection.

      The package requires builder repository onboarding before it can be added to the RHAI pipeline. This ticket is blocked by the builder onboarding ticket.

      Summary

      Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:

      Executive Summary: RapidFuzz Packaging Analysis

      RapidFuzz (v3.14.3) is a high-performance fuzzy string matching library with ~122M monthly PyPI downloads, licensed under MIT with no redistribution concerns. The package requires a source build with C++ extensions to deliver production-grade performance — without them, the pure Python fallback degrades throughput by 10-100x. The build system uses scikit-build-core with a CMake backend and requires a C++17 compiler, CMake >= 3.15, and Python development headers. Critically, the sdist on PyPI includes pre-generated Cython .cxx files, which eliminates the Cython build dependency and simplifies the build pipeline.

      There are no blocking dependencies or known build issues for Linux x86_64. All previously reported build failures (PEP 517 sdist builds, missing CMake dependency detection, libc+-19 compatibility) have been resolved in v3.14.3. The package has zero mandatory runtime dependencies — numpy is optional and only needed for matrix operations. Build requirements are minimal: gcc-c, cmake, python3-devel, and scikit-build-core. The vendored C+ libraries (rapidfuzz-cpp, taskflow) are both MIT-licensed and header-only, with no external system library dependencies beyond standard C++ runtime and libatomic.

      The single most important configuration for packaging is setting the environment variable RAPIDFUZZ_BUILD_EXTENSION=true. Without this, a failed C++ compilation will silently produce a pure Python wheel with severely degraded performance. With this flag, the build fails loudly on compilation errors, which is the desired behavior for controlled packaging environments. Since RapidFuzz performs pure CPU string matching with no GPU code, a single wheel serves all three index targets (CPU, CUDA, ROCm) identically — no index-specific builds are required.

      Key Build Command

      dnf install gcc-c++ cmake python3-devel
      export RAPIDFUZZ_BUILD_EXTENSION=true
      pip wheel --no-binary :all: rapidfuzz==3.14.3
      

      Validation Checks

      • import rapidfuzz succeeds
      • rapidfuzz.fuzz.ratio("test", "test") returns 100.0
      • import rapidfuzz.fuzz_cpp does not raise ImportError (confirms C++ extensions are present)

      Key Findings

      • License: MIT — fully compliant for redistribution, including vendored dependencies
      • Blockers: None identified for Linux x86_64
      • Risk: Low — mature build system, active maintenance, comprehensive CI across Python 3.10–3.14
      • Single wheel per architecture: No GPU dependencies, one build covers CPU/CUDA/ROCm indexes

              epacific@redhat.com Einat Pacifici
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: