-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
Add an AI-powered error analysis job to the nightly builds that automatically summarizes build failures, identifies root causes, and provides actionable context in Slack notifications.
This proposal is a suggested extension to AIPCC-8739.
Background
Currently, when nightly builds fail, the team receives a Slack notification with:
- Failed job names and stages
- Links to GitLab job pages
Team members must manually navigate to GitLab, locate the relevant log files (errors.log, per-package build logs in work-dir/), and perform root cause analysis.
This is time-consuming and requires tribal knowledge about the codebase.
Proposed Enhancement
Update the existing nightly pipeline with an additional job that runs after build failures (before or alongside the existing Slack notification) that:
1. Collects relevant logs from failed job artifacts.
2. Analyzes errors using an AI agent to:
- Extract and quote the actual error messages from logs
- Cross-reference with relevant code files (builder overrides, patches, plugins)
- Check upstream GitHub code and changes when relevant
- Provide a root cause analysis (RCA)
3. Sends enhanced Slack notification with:
- Original failure info (job names, URLs)
- Quoted error message(s)
- RCA summary with references and links
- Suggested next steps to fix the failures
- depends on
-
AIPCC-8739 Set up automation for slack notifications and JIRA creation
-
- Closed
-
- relates to
-
AIPCC-9349 Proactive upstream dependency monitoring tool
-
- New
-
- mentioned on