-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Add package 'rapidfuzz' into the RHAI pipeline onboarding collection.
The package requires builder repository onboarding before it can be added to the RHAI pipeline. This ticket is blocked by the builder onboarding ticket.
Summary
Here is the executive summary formatted as a JIRA comment in JIRA wiki markup:
Executive Summary: RapidFuzz Packaging Analysis
RapidFuzz (v3.14.3) is a high-performance fuzzy string matching library with ~122M monthly PyPI downloads, licensed under MIT with no redistribution concerns. The package requires a source build with C++ extensions to deliver production-grade performance — without them, the pure Python fallback degrades throughput by 10-100x. The build system uses scikit-build-core with a CMake backend and requires a C++17 compiler, CMake >= 3.15, and Python development headers. Critically, the sdist on PyPI includes pre-generated Cython .cxx files, which eliminates the Cython build dependency and simplifies the build pipeline.
There are no blocking dependencies or known build issues for Linux x86_64. All previously reported build failures (PEP 517 sdist builds, missing CMake dependency detection, libc+-19 compatibility) have been resolved in v3.14.3. The package has zero mandatory runtime dependencies — numpy is optional and only needed for matrix operations. Build requirements are minimal: gcc-c, cmake, python3-devel, and scikit-build-core. The vendored C+ libraries (rapidfuzz-cpp, taskflow) are both MIT-licensed and header-only, with no external system library dependencies beyond standard C++ runtime and libatomic.
The single most important configuration for packaging is setting the environment variable RAPIDFUZZ_BUILD_EXTENSION=true. Without this, a failed C++ compilation will silently produce a pure Python wheel with severely degraded performance. With this flag, the build fails loudly on compilation errors, which is the desired behavior for controlled packaging environments. Since RapidFuzz performs pure CPU string matching with no GPU code, a single wheel serves all three index targets (CPU, CUDA, ROCm) identically — no index-specific builds are required.
Key Build Command
dnf install gcc-c++ cmake python3-devel
export RAPIDFUZZ_BUILD_EXTENSION=true
pip wheel --no-binary :all: rapidfuzz==3.14.3
Validation Checks
- import rapidfuzz succeeds
- rapidfuzz.fuzz.ratio("test", "test") returns 100.0
- import rapidfuzz.fuzz_cpp does not raise ImportError (confirms C++ extensions are present)
Key Findings
- License: MIT — fully compliant for redistribution, including vendored dependencies
- Blockers: None identified for Linux x86_64
- Risk: Low — mature build system, active maintenance, comprehensive CI across Python 3.10–3.14
- Single wheel per architecture: No GPU dependencies, one build covers CPU/CUDA/ROCm indexes
- blocks
-
AIPCC-10814 Onboard rapidfuzz into the AIPCC Builder
-
- In Progress
-
- mentioned on