-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Package 'apache-airflow' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.
Build Failure Summary
Root Cause Analysis: `apache-airflow` Build Failure
Summary
The build of `apache-airflow` (3.1.7) failed because its transitive dependency `pendulum` 3.2.0 could not be built from source. Pendulum contains a Rust extension (built via `maturin`) that requires downloading Rust crate dependencies (e.g., `pyo3`) from `crates.io` at build time. The build environment enforces network isolation, which blocks access to `index.crates.io`.
—
Dependency Chain
apache-airflow 3.1.7
→ apache-airflow-core 3.1.7
→ apache-airflow-task-sdk 1.1.7
→ pendulum >=3.1.0 (resolved to 3.2.0) ← FAILS HERE
—
Failure Details
The `pendulum` build uses `maturin` as its PEP 517 build backend. When `maturin` invokes `cargo metadata` to resolve Rust dependencies, Cargo attempts to fetch the crates.io index and fails because DNS resolution is blocked by the network isolation wrapper (`run_network_isolation.sh`):
pendulum: Updating crates.io index pendulum: warning: spurious network error (3 tries remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io) pendulum: warning: spurious network error (2 tries remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io) pendulum: warning: spurious network error (1 try remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io) pendulum: error: failed to get `pyo3` as a dependency of package `_pendulum v3.2.0`
This results in:
pendulum: 💥 maturin failed Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
There is also an earlier warning that may be relevant:
WARNING pendulum: Rust build backend maturin detected, but no Cargo.toml files found.
This suggests the source preparation step may not have properly located or included the Rust source tree (which lives in `rust/` within pendulum's sdist).
—
What Is Needed to Fix This
1. Vendor Rust crate dependencies for pendulum - The Rust crates (notably `pyo3` and its transitive dependencies) must be pre-fetched and made available offline before the network-isolated build begins. This is typically done by running `cargo vendor` on pendulum's Rust workspace and configuring a `.cargo/config.toml` to point Cargo at the vendored sources instead of `crates.io`.
2. Investigate the missing `Cargo.toml` warning - The warning "Rust build backend maturin detected, but no Cargo.toml files found" during source preparation may indicate that the build tooling's source prep step is not correctly unpacking or recognizing pendulum's Rust sub-directory. If `Cargo.toml` files are absent from the prepared source, vendoring alone won't help — the source preparation logic needs to be fixed to include the full Rust source tree.
3. Ensure the build tooling supports maturin-based packages - The `fromager` build pipeline needs a mechanism (e.g., a pre-build hook) to run `cargo vendor` and configure offline Cargo builds for packages that use `maturin` or `setuptools-rust` as their build backend, so that the network-isolated build can succeed.
Packaging Analysis Summary
Here is the executive summary formatted as a JIRA comment using JIRA wiki markup:
—
Executive Summary: apache-airflow Packaging Analysis
apache-airflow (v3.1.7) is a pure Python package with a build complexity score of 0/10, meaning it requires no native compilation (C/C++/Rust/Fortran) and ships as a universal wheel (
none-any
). Building a wheel from the source distribution is straightforward and does not require specialized build tooling beyond a standard Python environment. The package requires Python >=3.10, <3.14. From a pure build perspective, this package presents no significant blockers for onboarding.
The primary complexity of apache-airflow lies not in compilation but in its massive dependency ecosystem. The package declares 237 dependencies, the vast majority of which are optional provider packages gated behind extras (e.g.,
apache-airflow-providers-amazon
,
apache-airflow-providers-google
,
apache-airflow-providers-cncf-kubernetes
). The two hard runtime dependencies are
apache-airflow-core==3.1.7
and
apache-airflow-task-sdk==1.1.7
. Several of the optional provider dependencies pull in packages that do require compilation (e.g.,
python-ldap
, database drivers, gRPC libraries), so the transitive dependency surface must be carefully scoped based on which extras are enabled. Some provider packages also carry Python version restrictions (e.g.,
apache-beam
,
fab
,
yandex
, and
ydb
exclude Python 3.13).
Key recommendations for onboarding:
- Building the core
apache-airflow
wheel itself is trivial — source distribution is available and the wheel is architecture-independent.
- Define the required extras early — the dependency footprint varies dramatically between a minimal install (
apache-airflow-core
+
task-sdk
) and a full install with all 100+ providers. Each selected extra may introduce its own transitive compilation requirements.
- Investigate
apache-airflow-core
separately, as it is the actual runtime package and may carry additional build considerations not visible at this meta-package level.
- Provider packages that pull in compiled dependencies (e.g.,
ldap
,
grpc
,
odbc
,
mysql
,
postgres
) should be individually assessed for platform-specific wheel availability and build toolchain requirements.
- is blocked by
-
AIPCC-10742 Add apache-airflow into the RHAI pipeline onboarding collection
-
- In Progress
-
- mentioned on