-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Story (Required)
As a platform engineer trying to *automate workflows based on LLM analysis
results_ I want _LLMs to return structured JSON responses with defined schemas*
so that *I can programmatically process AI insights and trigger downstream
automation*
This feature enables LLM providers to return structured, typed JSON responses instead of free-form text. By defining JSON schemas for analysis responses, teams can reliably parse results to extract severity levels, affected files, recommended actions, and confidence scores. This enables automation such as creating Jira tickets for HIGH severity issues, routing problems to specific teams, and generating metrics from LLM analyses.
Background (Required)
Currently, LLM analysis returns free-form text that is designed for human consumption (PR comments). While this works well for developers reading insights, it has limitations:
- Difficult to parse programmatically for downstream automation
- No standardized fields for severity, category, or confidence
- Cannot reliably extract structured data like affected file paths or line numbers
- Inconsistent formatting across different prompts and LLM providers
- Hard to build metrics, dashboards, or reporting on top of free-text responses
Modern LLM providers (OpenAI, Gemini) support structured output modes where
responses conform to a defined JSON schema. This enables reliable parsing and
automation.
Related: Current LLM analysis implementation at docs/content/docs/guide/llm-analysis.md
Out of scope
- Automatic conversion of existing free-text prompts to structured prompts (users must update prompts)
- AI-powered schema inference or generation
- Support for non-JSON structured formats (XML, YAML, Protocol Buffers)
- Schema validation beyond what the LLM provider offers
- Backwards compatibility for existing roles (new field is opt-in)
Approach (Required)
High-level technical approach:
Add response_format field to AnalysisRole configuration with values text (default) or json
Add response_schema field to define expected JSON structure using JSON Schema specification
When response_format: "json" is set, pass the schema to the LLM provider's structured output API
Parse and validate returned JSON against the schema
Make structured fields available for downstream processing
*Output destinations can access typed fields (e.g., annotations need file_path, line_number)
* CEL expressions can reference structured response fields
* Logging and metrics can extract severity, category, confidence
Handle schema validation failures gracefully (log error, fall back to text mode)
Support common schema patterns out of the box (severity, category, affected_files, actions, confidence)
The feature should be opt-in and backward compatible with existing text-based roles.
Dependencies
- Existing LLM analysis infrastructure (OpenAI/Gemini clients)
- LLM provider APIs that support structured output (OpenAI's JSON mode, Gemini's schema-based generation)
- Repository CRD must support response_format and response_schema fields
- May enable other features like GitHub annotations which depend on structured location data
Acceptance Criteria (Mandatory)
Given a Repository with an LLM role configured with response_format: "json" and a defined schema, When LLM analysis completes, Then the response is valid JSON conforming to the schema
Given a structured response with defined fields (severity, category, summary, actions), When parsing the response, Then all fields are accessible as typed data for downstream processing
Given a schema requiring specific fields (e.g., required: [severity, summary]), When the LLM returns a response missing required fields, Then the system logs a validation error and handles gracefully
Given multiple roles with different response formats (one JSON, one text), When analyses complete, Then each role's output is formatted according to its response_format setting
Given a structured response with affected_files and line_numbers, When used with output destinations like github-annotation, Then the structured data is correctly mapped to annotation locations
Given an LLM provider that doesn't support structured output, When response_format: "json" is configured, Then the system logs a warning and falls back to text mode or returns an error
Given a role without response_format specified, When LLM analysis runs, Then it defaults to text mode (backward compatibility)
Edge cases to consider:
- LLM returning malformed JSON despite structured output mode
- Schema definitions that are too complex for the LLM to follow reliably
- Different LLM providers having different structured output capabilities
- Very large schemas exceeding LLM context limits
- Handling optional vs required fields in schemas
- Nested objects and arrays in schema definitions
- Schema evolution and versioning over time
- is depended on by
-
SRVKP-9101 Add GitHub annotation output destination for LLM analysis results
-
- To Do
-