-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Package 'array_record' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.
Build Failure Summary
Root Cause Analysis: `array_record` Build Failure
Summary
The build failed because `array_record` does not publish source distributions (sdists) on PyPI — only pre-built wheels are available. The build tool (`fromager`) was searching for an sdist to build from source and found nothing.
Key Error
02:11:49 ERROR array_record: Unable to resolve requirement specifier array_record with constraint None using PyPI resolver (searching at https://pypi.org/simple): found no match for array_record using PyPI resolver (searching at https://pypi.org/simple), searching for sdists, ignoring pre-release versions
`fromager` is running in sdist-only (fast mode) as indicated by:
02:11:49 INFO sdist-only (fast mode), getting metadata from sdists
It requires source distributions to resolve and build packages, but `array_record` provides zero sdists on PyPI — only pre-built wheels for specific platforms (Linux x86_64/aarch64, macOS ARM64).
What Would Be Needed to Fix It
1. Provide a source override in fromager: Configure fromager with a source URL pointing to the `array_record` source repository (the project source is hosted at https://github.com/google/array_record(https://github.com/google/array_record)). This would allow fromager to fetch the source code directly from Git rather than looking for an sdist on PyPI.
2. Build complexity warning: `array_record` is a C+/Python package that depends on system-level libraries (notably Abseil, Riegeli, and other Google C+ components). Building from source requires a working C++ toolchain and potentially Bazel, which is the upstream build system. This makes source-building non-trivial and may require additional build dependencies and overrides in the build environment.
3. If source building proves infeasible: The package would need to be added to the pre-built wheels list (alongside the other packages already listed like `intel-cmplr-lib-ur`, `tensorboard`, etc. visible at line 181 of the log). However, the existing PyPI wheels only cover a limited set of platforms, so compatibility with the target environment (`cpu-ubi9-x86_64`) should be verified first.
Packaging Analysis Summary
Executive Summary: array_record Build Analysis
array_record is a Google-developed, high-performance file format library for ML IO workloads, built on top of Google's Riegeli library. It is licensed under Apache-2.0, which is fully compatible with Red Hat redistribution requirements. The package is a hard dependency for tensorflow-datasets and has no substitutes – it is the only implementation of the ArrayRecord format. The recommended target version is 0.8.3 (latest stable on PyPI).
The primary challenge for source builds is that array_record uses Bazel exclusively as its build system – there is no setuptools, CMake, or Meson fallback for compiling the C++ native extension. The entire C++ dependency chain (Riegeli, Abseil, Protobuf 28.3, Eigen, pybind11) is resolved hermetically through Bazel's module system. Building from source requires cloning the Git repository, running Bazel 7.2.1 inside a manylinux2014 Docker container, and using auditwheel repair to produce self-contained wheels. Build complexity is rated 9/10. No sdist is published to PyPI. For x86_64 Linux, the build path is well-established and has no blockers; however, ppc64le is not supported due to unresolved HighwayHash linker failures and Riegeli API incompatibilities (issues #149, #151).
A critical runtime constraint exists: Protobuf must be pinned to version 28.3. Using Protobuf 29.x causes segfaults when co-loaded with TensorFlow – the primary use case for this package. This pinning is already enforced in the upstream MODULE.bazel via single_version_override. Version 0.8.0 was yanked from PyPI due to this exact issue; current v0.8.3 is stable. The [beam] extra should not be included in default installs to avoid pulling in the heavy apache-beam[gcp] dependency.
Recommended approach: Replicate the upstream CI pipeline using a Dockerized manylinux2014 environment with Bazel 7.2.1. As a pragmatic fallback, upstream PyPI wheels are self-contained (all C++ deps statically linked, only standard system libs required at runtime) and available for x86_64, aarch64, and macOS ARM64 across Python 3.11--3.14.
Key Build Commands (inside manylinux2014 container)
# Install Bazel 7.2.1 curl -sSL -o /usr/local/bin/bazel \ "https://github.com/bazelbuild/bazel/releases/download/7.2.1/bazel-7.2.1-linux-x86_64" chmod +x /usr/local/bin/bazel # Set environment export PYTHON_VERSION=3.12 PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=12 export BAZEL_VERSION=7.2.1 AUDITWHEEL_PLATFORM=manylinux2014_x86_64 export PYTHON_BIN=$(which python3) # Build wheel bash oss/build_whl.sh
Critical Findings
- Build system: Bazel 7.2.1 only – no alternative compilation path exists
- Protobuf pinning: Must use 28.3; 29.x causes segfaults with TensorFlow
- Platform support: x86_64 and aarch64 Linux supported; ppc64le blocked upstream
- License: Apache-2.0 – fully compliant for redistribution
- No substitutes: Only implementation of ArrayRecord format; required by tensorflow-datasets
- is blocked by
-
AIPCC-10853 Add array_record into the RHAI pipeline onboarding collection
-
- In Progress
-
- mentioned on