Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: AI, Pipelines as Code
Labels:
- ai
- applied-ai

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a platform engineer trying to *automate workflows based on LLM analysis
results_ I want _LLMs to return structured JSON responses with defined schemas*
so that *I can programmatically process AI insights and trigger downstream
automation*

This feature enables LLM providers to return structured, typed JSON responses instead of free-form text. By defining JSON schemas for analysis responses, teams can reliably parse results to extract severity levels, affected files, recommended actions, and confidence scores. This enables automation such as creating Jira tickets for HIGH severity issues, routing problems to specific teams, and generating metrics from LLM analyses.

Background (Required)

Currently, LLM analysis returns free-form text that is designed for human consumption (PR comments). While this works well for developers reading insights, it has limitations:

Difficult to parse programmatically for downstream automation
No standardized fields for severity, category, or confidence
Cannot reliably extract structured data like affected file paths or line numbers
Inconsistent formatting across different prompts and LLM providers
Hard to build metrics, dashboards, or reporting on top of free-text responses

Modern LLM providers (OpenAI, Gemini) support structured output modes where
responses conform to a defined JSON schema. This enables reliable parsing and
automation.

Related: Current LLM analysis implementation at docs/content/docs/guide/llm-analysis.md

Out of scope

Automatic conversion of existing free-text prompts to structured prompts (users must update prompts)
AI-powered schema inference or generation
Support for non-JSON structured formats (XML, YAML, Protocol Buffers)
Schema validation beyond what the LLM provider offers
Backwards compatibility for existing roles (new field is opt-in)

Approach (Required)

High-level technical approach:

Add `response_format` field to `AnalysisRole` configuration with values `text` (default) or `json`

Add `response_schema` field to define expected JSON structure using JSON Schema specification

When `response_format: "json"` is set, pass the schema to the LLM provider's structured output API

Parse and validate returned JSON against the schema

Make structured fields available for downstream processing

*Output destinations can access typed fields (e.g., annotations need `file_path`, `line_number`)

* CEL expressions can reference structured response fields

* Logging and metrics can extract `severity`, `category`, `confidence`

Handle schema validation failures gracefully (log error, fall back to text mode)

Support common schema patterns out of the box (severity, category, affected_files, actions, confidence)

The feature should be opt-in and backward compatible with existing text-based roles.

Dependencies

Existing LLM analysis infrastructure (OpenAI/Gemini clients)
LLM provider APIs that support structured output (OpenAI's JSON mode, Gemini's schema-based generation)
Repository CRD must support response_format and response_schema fields
May enable other features like GitHub annotations which depend on structured location data

Acceptance Criteria (Mandatory)

Given a Repository with an LLM role configured with `response_format: "json"` and a defined schema, When LLM analysis completes, Then the response is valid JSON conforming to the schema

Given a structured response with defined fields (severity, category, summary, actions), When parsing the response, Then all fields are accessible as typed data for downstream processing

Given a schema requiring specific fields (e.g., `required: [severity, summary]`), When the LLM returns a response missing required fields, Then the system logs a validation error and handles gracefully

Given multiple roles with different response formats (one JSON, one text), When analyses complete, Then each role's output is formatted according to its `response_format` setting

Given a structured response with `affected_files` and `line_numbers`, When used with output destinations like `github-annotation`, Then the structured data is correctly mapped to annotation locations

Given an LLM provider that doesn't support structured output, When `response_format: "json"` is configured, Then the system logs a warning and falls back to text mode or returns an error

Given a role without `response_format` specified, When LLM analysis runs, Then it defaults to `text` mode (backward compatibility)

Edge cases to consider:

LLM returning malformed JSON despite structured output mode
Schema definitions that are too complex for the LLM to follow reliably
Different LLM providers having different structured output capabilities
Very large schemas exceeding LLM context limits
Handling optional vs required fields in schemas
Nested objects and arrays in schema definitions
Schema evolution and versioning over time

is depended on by

SRVKP-9101 Add GitHub annotation output destination for LLM analysis results

To Do

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Add response_format field to AnalysisRole configuration with values text (default) or json

Add response_schema field to define expected JSON structure using JSON Schema specification

When response_format: "json" is set, pass the schema to the LLM provider's structured output API

Parse and validate returned JSON against the schema

Make structured fields available for downstream processing

*Output destinations can access typed fields (e.g., annotations need file_path, line_number)

* CEL expressions can reference structured response fields

* Logging and metrics can extract severity, category, confidence

Handle schema validation failures gracefully (log error, fall back to text mode)

Support common schema patterns out of the box (severity, category, affected_files, actions, confidence)

Dependencies

Acceptance Criteria (Mandatory)

Given a Repository with an LLM role configured with response_format: "json" and a defined schema, When LLM analysis completes, Then the response is valid JSON conforming to the schema

Given a structured response with defined fields (severity, category, summary, actions), When parsing the response, Then all fields are accessible as typed data for downstream processing

Given a schema requiring specific fields (e.g., required: [severity, summary]), When the LLM returns a response missing required fields, Then the system logs a validation error and handles gracefully

Given multiple roles with different response formats (one JSON, one text), When analyses complete, Then each role's output is formatted according to its response_format setting

Given a structured response with affected_files and line_numbers, When used with output destinations like github-annotation, Then the structured data is correctly mapped to annotation locations

Given an LLM provider that doesn't support structured output, When response_format: "json" is configured, Then the system logs a warning and falls back to text mode or returns an error

Given a role without response_format specified, When LLM analysis runs, Then it defaults to text mode (backward compatibility)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide