-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
-
Low
We are observing intermittent failures in the bootstrap stage of the wheels builder pipeline. The failure seems to be triggered when a dependency has a new version released on PyPI during the build process (although we're not certain).
The error message follows a consistent pattern:
ERROR: could not handle toplevel dependency <package> (<version>) because Trying to add setuptools==80.8.0 to parent <package>==<version> but <package>==<version> does not exist
This suggests that the build process identifies a package version but then fails to find it later in the dependency resolution graph.
Observed Instances
This issue has been observed with at least two different packages:
- Package: boto3==1.40.7
- Job: test-non-accelerated-cpu-ubi9-ppc64le-bootstrap-and-onboardĀ
- Log: https://gitlab.com/redhat/rhel-ai/wheels/builder/-/jobs/10983843137
- Context: The 1.40.7 version was released on PyPI the same day the job failed.
- Package: docling==2.44.0
- Job: test-accelerated-cpu-ubi9-aarch64-bootstrap-and-onboard
- Log: https://gitlab.com/redhat/rhel-ai/wheels/builder/-/jobs/10989977287
- Context: The 2.44.0 version was released just hours before this job failed. Rerunning the job resolved the issue.
Tasks
- Analyze Logs: Perform a detailed analysis of the debug logs for the failed jobs to trace the sequence of events for the boto3 and docling packages.
- Root Cause Analysis: Determine why a package version that is initially discovered later fails validation with a "does not exist" error.
- Develop a Fix: Propose and implement a solution to make the build process more resilient to newly released or updated packages on PyPI.
Notes
- The leading hypothesis is that there's a race condition. The build may be fetching metadata at different points, and if a package is updated on PyPI in between these steps, it could lead to inconsistent state.
- However, the bootstrap job runs in a single thread, which should mitigate simple race conditions. It's possible the issue lies in how package versions are selected and then re-verified, or perhaps in an interaction with a caching layer or mirror that experiences replication delays.
- When working on a solution, we should also consider edge cases where a package gets "yanked" from PyPI, and if possible, ensure the build knows how to handle such situations as well.
- See this Slack thread for additional context.