-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Payload manager smart-alerts
-
To Do
-
None
-
-
False
-
-
False
-
Not Selected
-
None
-
None
-
None
-
0
Overhaul alert system for CI lanes to make payload manager role alert-driven.
AI gathers CI state from sippy, prow, component readiness, release controller, and stability sources. Define rules for alert categories (urgent, major, normal, minor) and responses. System notifies payload managers in Slack, recommends time boxes for investigations, and auto-updates Jira with results of active investigations.
Alert levels:
- urgent: X failed installs in row for topology → notify immediately in Slack
- major: 2+ blocking jobs fail in row → provide analysis, advise TRT attendance
- normal: Single blocking job failure, component readiness regression, or important lane permafailing
- minor: Non-important lane permafailing or regular maintenance
Acceptance Criteria
- System integrated with existing payload manager process
- Produces Slack alerts for urgent issues
- Published to its own repo with developer documentation
Scope
In Scope
- AI data gathering from multiple sources (sippy, prow, component readiness, release controller)
- Alert rule definition system
- Alert categorization (urgent, major, normal, minor)
- Slack notification integration
- Payload manager duty recommendations with time boxing
- Jira board auto-update with results of active investigations
- CI lane importance level definitions
Out of Scope
- Changes to underlying CI infrastructure
- Modifications to sippy/prow/component readiness tools
Timeline
- Target: Q4 2026