Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: AI, Pipelines as Code
Labels:
- ai

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a developer trying to resolve pipeline failures faster I want LLM analysis to reference similar past failures and their solutions so that I can learn from previous fixes and avoid repeating investigation work

*This feature enables the LLM analysis system to maintain a history of past analyses and their outcomes, then surface relevant historical context when analyzing new failures. When a test failure occurs, the system can
identify similar past failures and show what solutions worked before. This dramatically reduces time-to-resolution by leveraging organizational knowledge and avoiding redundant troubleshooting.*

Background (Required)

Currently, each LLM analysis operates in isolation without knowledge of past failures or solutions. This means:

Developers repeatedly investigate the same types of failures
No way to learn which LLM suggestions actually led to successful fixes
Cannot surface "this looks like issue #1234 which was fixed by updating dependency X"
Lost opportunity to build organizational knowledge base
No feedback loop to improve analysis quality over time
Similar failures across different repositories aren't connected

Historical context and pattern recognition are key capabilities that distinguish experienced engineers from novices. Enabling LLMs to access this context provides similar benefits.

Related: Current LLM analysis implementation at docs/content/docs/guide/llm-analysis.md

Out of scope

Real-time machine learning or model fine-tuning based on history
Cross-repository analysis aggregation (initial implementation is per-repository)
Automatic application of past fixes without human approval
Long-term data warehouse or analytics platform
Privacy controls for sensitive failure information
Deletion or archival policies for old analyses

Approach (Required)

High-level technical approach:

Store analysis results with metadata (failure type, error messages, affected files, resolution status)

When new analysis is triggered, search historical analyses for similar patterns

Use similarity scoring based on

*Error message text similarity

* Affected file paths

*Failure categories

* Commit content patterns

Include top N most similar historical analyses in the LLM context

Track whether suggested fixes led to successful resolution (did next pipeline run succeed?)

Add `historical_analyses` context item to role configuration

Configure lookback period, similarity threshold, and max references

Present historical context in a structured format the LLM can reason about

Allow users to mark analyses as "helpful" or "not helpful" for future reference

The feature should be opt-in per role and respect configured retention periods.

Dependencies

Existing LLM analysis infrastructure
Storage mechanism for historical analysis data (Kubernetes annotations, ConfigMaps, or external database)
Ability to track PipelineRun outcomes (success/failure after fix applied)
Repository CRD must support historical context configuration
May benefit from structured response format for consistent similarity matching

Acceptance Criteria (Mandatory)

Given a role with `historical_analyses: true` configured, When LLM analysis runs, Then the context includes similar past failures from the lookback period

Given a current failure and 5 past similar failures in history, When similarity threshold is 0.8, Then only failures with >=80% similarity are included in context

Given historical analyses are included, When the LLM generates its response, Then it references relevant past solutions (e.g., "Similar to previous failure fixed by...")

Given a past analysis suggested a fix and the next pipeline run succeeded, When that analysis is surfaced as historical context, Then it's marked as "validated solution"

Given `max_references: 3` configured, When more than 3 similar past failures exist, Then only the top 3 most similar are included

Given no similar past failures exist, When historical analysis runs, Then it proceeds normally without historical context

Given historical data storage, When the configured retention period expires, Then old analyses are cleaned up according to policy

Edge cases to consider:

First-time failures with no history to reference
Storage limits when many analyses accumulate
Similarity scoring for very different types of failures
False positives where "similar" failures have different root causes
Performance impact of searching large history datasets
Handling renamed or moved files when comparing affected paths
Privacy considerations for failure information in shared repositories

Details

Description