Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42543

Surface error message for easier troubleshooting if model is not available

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.16
    • Lightspeed
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When OLSConfig is updated with a model name that is not available with the token in the associated secret the pod will be Ready and nothing on the pod level will indicate there is an issue, however UI plugin will display a message OLS is not ready yet. If you check the logs for lightspeed-service-api it keeps retrying every 5 seconds endlessly even though getting a 404 response from the API

      Version-Release number of selected component (if applicable):

      0.1.6

      How reproducible:

      Specify non-existent model name in OLSconfig

      Steps to Reproduce:

          1. update OLS config with a non-existent model
          2. wait for the pod to be recreated
          3. check OLS UI plugin and lightspeed-service-api container logs

      Actual results in lightspeed-service-api logs:

      2024-09-27 12:47:01,076 [httpx:_client.py:1026] INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 404 Not Found"
      2024-09-27 12:47:01,077 [ols.app.endpoints.health:health.py:43] ERROR: LLM connection check failed with - Error code: 404 - {'error': {'message': 'The model `o1-mini` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

      Expected results:

      Either prevent user from creating/updating OLSconfig with incorrect value (not sure if possible through Admission Webhooks) or set the status of the pod to something other than Ready to make troubleshooting easier (or another way to surface the issue to the admin)

      Additional info:

      initially reported on slack https://redhat-internal.slack.com/archives/C068JAU4Y0P/p1727441643267899

              xdharmai@redhat.com XAVIER RAJESH DHARMAIYAN
              kgordeev@redhat.com Katya Gordeeva
              None
              None
              Joao Bastos Fula Joao Bastos Fula
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: