-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Story (Required)
As a developer trying to resolve pipeline failures faster I want LLM analysis to reference similar past failures and their solutions so that I can learn from previous fixes and avoid repeating investigation work
*This feature enables the LLM analysis system to maintain a history of past analyses and their outcomes, then surface relevant historical context when analyzing new failures. When a test failure occurs, the system can
identify similar past failures and show what solutions worked before. This dramatically reduces time-to-resolution by leveraging organizational knowledge and avoiding redundant troubleshooting.*
Background (Required)
Currently, each LLM analysis operates in isolation without knowledge of past failures or solutions. This means:
- Developers repeatedly investigate the same types of failures
- No way to learn which LLM suggestions actually led to successful fixes
- Cannot surface "this looks like issue #1234 which was fixed by updating dependency X"
- Lost opportunity to build organizational knowledge base
- No feedback loop to improve analysis quality over time
- Similar failures across different repositories aren't connected
Historical context and pattern recognition are key capabilities that distinguish experienced engineers from novices. Enabling LLMs to access this context provides similar benefits.
Related: Current LLM analysis implementation at docs/content/docs/guide/llm-analysis.md
Out of scope
- Real-time machine learning or model fine-tuning based on history
- Cross-repository analysis aggregation (initial implementation is per-repository)
- Automatic application of past fixes without human approval
- Long-term data warehouse or analytics platform
- Privacy controls for sensitive failure information
- Deletion or archival policies for old analyses
Approach (Required)
High-level technical approach:
Store analysis results with metadata (failure type, error messages, affected files, resolution status)
When new analysis is triggered, search historical analyses for similar patterns
Use similarity scoring based on
*Error message text similarity
* Affected file paths
*Failure categories
* Commit content patterns
Include top N most similar historical analyses in the LLM context
Track whether suggested fixes led to successful resolution (did next pipeline run succeed?)
Add historical_analyses context item to role configuration
Configure lookback period, similarity threshold, and max references
Present historical context in a structured format the LLM can reason about
Allow users to mark analyses as "helpful" or "not helpful" for future reference
The feature should be opt-in per role and respect configured retention periods.
Dependencies
- Existing LLM analysis infrastructure
- Storage mechanism for historical analysis data (Kubernetes annotations, ConfigMaps, or external database)
- Ability to track PipelineRun outcomes (success/failure after fix applied)
- Repository CRD must support historical context configuration
- May benefit from structured response format for consistent similarity matching
Acceptance Criteria (Mandatory)
Given a role with historical_analyses: true configured, When LLM analysis runs, Then the context includes similar past failures from the lookback period
Given a current failure and 5 past similar failures in history, When similarity threshold is 0.8, Then only failures with >=80% similarity are included in context
Given historical analyses are included, When the LLM generates its response, Then it references relevant past solutions (e.g., "Similar to previous failure fixed by...")
Given a past analysis suggested a fix and the next pipeline run succeeded, When that analysis is surfaced as historical context, Then it's marked as "validated solution"
Given max_references: 3 configured, When more than 3 similar past failures exist, Then only the top 3 most similar are included
Given no similar past failures exist, When historical analysis runs, Then it proceeds normally without historical context
Given historical data storage, When the configured retention period expires, Then old analyses are cleaned up according to policy
Edge cases to consider:
- First-time failures with no history to reference
- Storage limits when many analyses accumulate
- Similarity scoring for very different types of failures
- False positives where "similar" failures have different root causes
- Performance impact of searching large history datasets
- Handling renamed or moved files when comparing affected paths
- Privacy considerations for failure information in shared repositories