-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
None
It seems like there is already an Issue for this in lightspeed-evaluation repository -> https://github.com/lightspeed-core/lightspeed-evaluation/issues/44
I'm creating this ticket for tracking purposes and pasting the description from GitHub here:
Description:
While running evaluations, I encountered the following error from ragas.executor:
Evaluating: 0%| | 0/1 [00:00<?, ?it/s] 2025-09-03 13:15:39,323 - ragas.executor - ERROR - Exception raised in Job[0]: OutputParserException(Failed to parse StringIO from completion {"question": "What does the error message \"admission webhook 'regular-user-validation.managed.openshift.io' denied the request\" during a ROSA cluster upgrade indicate, and how can it be resolved?", "noncommittal": 0}. Got: 1 validation error for StringIO text Field required [type=missing, input_value={'question': 'What does t...ed?', 'noncommittal': 0}, input_type=dict] For further information visit https://errors.pydantic.dev/2.11/v/missing
This issue is hard to reproduce, as the behavior depends on the LLM output and is thus non-deterministic.
Observed behavior:
- ragas:response_relevancy expects the LLM output to include a "text" field.
- Sometimes the model returns a dictionary without "text" (e.g., {"question": ..., "noncommittal": 0}) which causes OutputParserException.
- Because the parser fails, the score for that turn becomes nan.
Downstream effect:
- When a nan appears in the scores, the evaluation crashes with:
Traceback (most recent call last): ... File ".../statistics.py", line 124, in _finalize_metric_stats "std": statistics.stdev(scores) if len(scores) > 1 else 0.0, AttributeError: 'float' object has no attribute 'numerator'
Impact:
- This prevents the evaluation report from being generated.
- The failure is intermittent and depends on the LLM output structure.