Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: openstack-lightspeed
Labels:
None

Story Points:
2
Epic Link:
Prepare OpenStack Lightspeed evaluation dataset
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Sprint:
Sprint 18
sprint_count:
1
Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

It seems like there is already an Issue for this in lightspeed-evaluation repository -> https://github.com/lightspeed-core/lightspeed-evaluation/issues/44

I'm creating this ticket for tracking purposes and pasting the description from GitHub here:

Description:

While running evaluations, I encountered the following error from ragas.executor:

Evaluating:   0%|                                                                 | 0/1 [00:00<?, ?it/s]
2025-09-03 13:15:39,323 - ragas.executor - ERROR - Exception raised in Job[0]: 
OutputParserException(Failed to parse StringIO from completion 
{"question": "What does the error message \"admission webhook 'regular-user-validation.managed.openshift.io' denied the request\" during a ROSA cluster upgrade indicate, and how can it be resolved?", "noncommittal": 0}. 
Got: 1 validation error for StringIO
text
  Field required [type=missing, input_value={'question': 'What does t...ed?', 'noncommittal': 0}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing

This issue is hard to reproduce, as the behavior depends on the LLM output and is thus non-deterministic.

Observed behavior:

ragas:response_relevancy expects the LLM output to include a "text" field.
Sometimes the model returns a dictionary without "text" (e.g., {"question": ..., "noncommittal": 0}) which causes OutputParserException.
Because the parser fails, the score for that turn becomes nan.

Downstream effect:

When a nan appears in the scores, the evaluation crashes with:

Traceback (most recent call last):
  ...
  File ".../statistics.py", line 124, in _finalize_metric_stats
    "std": statistics.stdev(scores) if len(scores) > 1 else 0.0,
AttributeError: 'float' object has no attribute 'numerator'

Impact:

This prevents the evaluation report from being generated.
The failure is intermittent and depends on the LLM output structure.

Assignee:: Lukáš Piwowarski

Reporter:: Lukáš Piwowarski

Team:: rhos-observability-lightspeed

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/09/11 12:06 PM

Updated:: 2025/09/18 1:14 PM

Resolved:: 2025/09/18 1:14 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty