Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-10732

Onboard apache-airflow into the AIPCC Builder

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Package 'apache-airflow' does not build as-is via the AIPCC self-service pipeline and requires builder repository onboarding.

      Build Failure Summary

      Root Cause Analysis: `apache-airflow` Build Failure

      Summary

      The build of `apache-airflow` (3.1.7) failed because its transitive dependency `pendulum` 3.2.0 could not be built from source. Pendulum contains a Rust extension (built via `maturin`) that requires downloading Rust crate dependencies (e.g., `pyo3`) from `crates.io` at build time. The build environment enforces network isolation, which blocks access to `index.crates.io`.

      Dependency Chain

      apache-airflow 3.1.7
        → apache-airflow-core 3.1.7
          → apache-airflow-task-sdk 1.1.7
            → pendulum >=3.1.0 (resolved to 3.2.0)  ← FAILS HERE
      

      Failure Details

      The `pendulum` build uses `maturin` as its PEP 517 build backend. When `maturin` invokes `cargo metadata` to resolve Rust dependencies, Cargo attempts to fetch the crates.io index and fails because DNS resolution is blocked by the network isolation wrapper (`run_network_isolation.sh`):

      pendulum:     Updating crates.io index
      pendulum: warning: spurious network error (3 tries remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io)
      pendulum: warning: spurious network error (2 tries remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io)
      pendulum: warning: spurious network error (1 try remaining): [6] Couldn't resolve host name (Could not resolve host: index.crates.io)
      pendulum: error: failed to get `pyo3` as a dependency of package `_pendulum v3.2.0`
      

      This results in:

      pendulum: 💥 maturin failed
        Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
      

      There is also an earlier warning that may be relevant:

      WARNING pendulum: Rust build backend maturin detected, but no Cargo.toml files found.
      

      This suggests the source preparation step may not have properly located or included the Rust source tree (which lives in `rust/` within pendulum's sdist).

      What Is Needed to Fix This

      1. Vendor Rust crate dependencies for pendulum - The Rust crates (notably `pyo3` and its transitive dependencies) must be pre-fetched and made available offline before the network-isolated build begins. This is typically done by running `cargo vendor` on pendulum's Rust workspace and configuring a `.cargo/config.toml` to point Cargo at the vendored sources instead of `crates.io`.

      2. Investigate the missing `Cargo.toml` warning - The warning "Rust build backend maturin detected, but no Cargo.toml files found" during source preparation may indicate that the build tooling's source prep step is not correctly unpacking or recognizing pendulum's Rust sub-directory. If `Cargo.toml` files are absent from the prepared source, vendoring alone won't help — the source preparation logic needs to be fixed to include the full Rust source tree.

      3. Ensure the build tooling supports maturin-based packages - The `fromager` build pipeline needs a mechanism (e.g., a pre-build hook) to run `cargo vendor` and configure offline Cargo builds for packages that use `maturin` or `setuptools-rust` as their build backend, so that the network-isolated build can succeed.

      Packaging Analysis Summary

      Here is the executive summary formatted as a JIRA comment using JIRA wiki markup:

      Executive Summary: apache-airflow Packaging Analysis

      apache-airflow (v3.1.7) is a pure Python package with a build complexity score of 0/10, meaning it requires no native compilation (C/C++/Rust/Fortran) and ships as a universal wheel (

      none-any

      ). Building a wheel from the source distribution is straightforward and does not require specialized build tooling beyond a standard Python environment. The package requires Python >=3.10, <3.14. From a pure build perspective, this package presents no significant blockers for onboarding.

      The primary complexity of apache-airflow lies not in compilation but in its massive dependency ecosystem. The package declares 237 dependencies, the vast majority of which are optional provider packages gated behind extras (e.g.,

      apache-airflow-providers-amazon

      ,

      apache-airflow-providers-google

      ,

      apache-airflow-providers-cncf-kubernetes

      ). The two hard runtime dependencies are

      apache-airflow-core==3.1.7

      and

      apache-airflow-task-sdk==1.1.7

      . Several of the optional provider dependencies pull in packages that do require compilation (e.g.,

      python-ldap

      , database drivers, gRPC libraries), so the transitive dependency surface must be carefully scoped based on which extras are enabled. Some provider packages also carry Python version restrictions (e.g.,

      apache-beam

      ,

      fab

      ,

      yandex

      , and

      ydb

      exclude Python 3.13).

      Key recommendations for onboarding:

      • Building the core
        apache-airflow

        wheel itself is trivial — source distribution is available and the wheel is architecture-independent.

      • Define the required extras early — the dependency footprint varies dramatically between a minimal install (
        apache-airflow-core

        +

        task-sdk

        ) and a full install with all 100+ providers. Each selected extra may introduce its own transitive compilation requirements.

      • Investigate
        apache-airflow-core

        separately, as it is the actual runtime package and may carry additional build considerations not visible at this meta-package level.

      • Provider packages that pull in compiled dependencies (e.g.,
        ldap

        ,

        grpc

        ,

        odbc

        ,

        mysql

        ,

        postgres

        ) should be individually assessed for platform-specific wheel availability and build toolchain requirements.

              epacific@redhat.com Einat Pacifici
              aipcc-jira-bot@redhat.com AIPCC JIRABOT
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: