Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-19867

Do not stop the evaluation when RAGAS produces OutputParserException

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • openstack-lightspeed
    • None
    • Sprint 18
    • 1
    • Moderate

      It seems like there is already an Issue for this in lightspeed-evaluation repository -> https://github.com/lightspeed-core/lightspeed-evaluation/issues/44

      I'm creating this ticket for tracking purposes and pasting the description from GitHub here:

      Description:

      While running evaluations, I encountered the following error from ragas.executor:

      Evaluating:   0%|                                                                 | 0/1 [00:00<?, ?it/s]
      2025-09-03 13:15:39,323 - ragas.executor - ERROR - Exception raised in Job[0]: 
      OutputParserException(Failed to parse StringIO from completion 
      {"question": "What does the error message \"admission webhook 'regular-user-validation.managed.openshift.io' denied the request\" during a ROSA cluster upgrade indicate, and how can it be resolved?", "noncommittal": 0}. 
      Got: 1 validation error for StringIO
      text
        Field required [type=missing, input_value={'question': 'What does t...ed?', 'noncommittal': 0}, input_type=dict]
          For further information visit https://errors.pydantic.dev/2.11/v/missing 

      This issue is hard to reproduce, as the behavior depends on the LLM output and is thus non-deterministic.

      Observed behavior:

      • ragas:response_relevancy expects the LLM output to include a "text" field.
      • Sometimes the model returns a dictionary without "text" (e.g., {"question": ..., "noncommittal": 0}) which causes OutputParserException.
      • Because the parser fails, the score for that turn becomes nan.

      Downstream effect:

      • When a nan appears in the scores, the evaluation crashes with:
      Traceback (most recent call last):
        ...
        File ".../statistics.py", line 124, in _finalize_metric_stats
          "std": statistics.stdev(scores) if len(scores) > 1 else 0.0,
      AttributeError: 'float' object has no attribute 'numerator' 

      Impact:

      • This prevents the evaluation report from being generated.
      • The failure is intermittent and depends on the LLM output structure.

              lpiwowar Lukáš Piwowarski
              lpiwowar Lukáš Piwowarski
              rhos-workloads-lightspeed
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: