-
Initiative
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
False
-
-
False
-
67% To Do, 0% In Progress, 33% Done
AIA PAI Ce Hin R gemini-2.5-pro v1.0
Feature title: Establish Python package metadata guidelines and auditing tools
Feature Overview:
A significant number of our Python dependencies suffer from incorrect, ambiguous, or outdated package metadata, especially concerning software licenses. This creates compliance risks and technical debt. This initiative will first establish clear guidelines for modern, compliant Python package metadata (per PEP 639, PEP 753) and develop an automated tool to audit our internal Pulp PyPI. This provides a clear, actionable path to identify and prioritize metadata issues, reducing legal risk and setting the foundation for future upstream improvement efforts.
Product(s) associated:
RHAIIS: yes
RHEL AI: yes
RHOAI: yes
Goals:
To quantify and standardize the metadata quality of our Python dependencies, starting with license compliance.
Outcome:
- Update our Red Hat AI Python Packaging Best Practices if needed
- An automated CLI tool that scans our internal package repositories (example link for CUDA: https://console.redhat.com/api/pypi/public-rhai/rhoai/3.0/cuda-ubi9-test/simple/) and generates a report of non-compliant packages, focusing first on license field errors (per PEP 639).
- Investigate how we can track new packages defects using the new audit tool, after that an initial list was generated (maybe a periodic script running in CI?).
Requirements:
- Guideline development:
- Improve Red Hat AI Python Packaging Best Practices
- This guideline must define the standard for license clarity (using PEP 639 SPDX expressions).
- This guideline must define the standard for project URLs (using PEP 753).
- The guideline should also recommend moving from setup.py to pyproject.toml for pure Python packages. But the auditing tool won't cover that part, because it's a much bigger work that would be tracked via another Initiative at some point.
- Tooling development:
- Create a new Python CLI tool, based on what Christian has written here. Move that code into Gitlab where people can contribute.
- The tool must be able to ingest the list of all packages from our internal Pulp PyPI instance.
- The tool must use a library (e.g., pypi-simple) to fetch package metadata from the public PyPI.org.
- The tool must parse the metadata (version >= 2.4) and validate it against the new guidelines.
- The tool's primary function is to check the License field for PEP 639 compliance.
- The tool must generate a human-readable report (e.g., CSV, JSON) of all non-compliant packages, we'll track the work through another Jira initiative.
Done - Acceptance Criteria:
The guideline is updated and the tool was executed, so we can track the work to do via Jira Bugs, since it's a multi release effort. The goal is to close this Initiative once we have the list of projects we need to fix. The list will be dynamically updated via a script and people will be able to pick up the work by filling a bug (or we could have a tool that files bugs for us for each component).
Use Cases - i.e. User Experience & Workflow:
N/A
Out of Scope:
- Fixing all the identified metadata issues in upstream projects.
- The auditing tool won't identify whether a project still uses setup.py.
- Refactoring internal builder plugins or fromager configurations.
- Auditing packages that are not present in our internal Pulp PyPI.
Documentation Considerations :
Our reference documentation is:
Red Hat AI Python Packaging Best Practices
- relates to
-
AIPCC-7356 Skill for python packaging metadata validation
-
- In Progress
-