Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-8577

PAC: Implement simple AI-powered log analysis on job failures

XMLWordPrintable

    • pac-llm-integration
    • 12
    • False
    • Hide

      None

      Show
      None
    • False
    • In Progress
    • 50% To Do, 0% In Progress, 50% Done

      Story (Required)

      As a Tekton user, I want PAC to automatically analyze the last 50 lines of a failed job's logs so that I can quickly identify the potential root cause of the failure without manually sifting through the logs.

      The goal is to provide a quick, preliminary analysis of job failure logs, offering users an immediate hint at the problem. This improves the user's experience by saving them time and effort in debugging, especially for common or obvious issues.

      Background (Required)

      We need to get started integrating LLM features on PaC, this simple feature provides a framework to experiement with LLM integrations with PaC.

      Out of scope

      Complex RAG (Retrieval Augmented Generation) system for deep log analysis.

      Persistent storage of log analysis results.

      Integration with other systems for automated remediation.

      Real-time log streaming and analysis.

      Advanced self-learning or fine-tuning of the AI model.

      Approach (Required)

      _The plan is to implement a simple log analysis feature in the PAC tool. Upon a job failure, PAC will:

      Fetch the last 50 lines (default, configurable) of the job's failure logs.

      Send these log lines to a simple AI model (e.g., a pre-trained language model via a cloud service) with a prompt.

      The prompt will instruct the model to summarize the potential issue from the provided log snippets.

      Present the model's response directly to the user in a clear, visible format, such as a dedicated section in the PAC output._

      Example prompt structure:
      "Analyze the following log snippets from a failed job and provide a concise summary of the potential issue. Focus on key errors, stack traces, or critical messages. Only provide the summary, no extra commentary. Logs: [log snippets here]"

      Previous POC

      Dependencies

      Access to an AI model/API (e.g., OpenAI, Google Gemini, or a suitable internal service).

      PAC's ability to retrieve a limited number of lines from the job logs.

      Story to establish the API key/credentials for the AI service.

       

      Acceptance Criteria (Mandatory)

      The PAC command must be able to fetch the last x lines of a failed job's log, with x defaulting to 50.

      The system must send the retrieved log lines to an AI model for analysis.

      The AI-generated summary of the potential issue must be displayed to the user upon a job failure.

      The feature should be enabled by a new command-line flag or a configuration setting.

      The output should clearly indicate that the analysis is AI-generated and may not always be accurate.

      If the AI service is unreachable or fails, the system should gracefully fail and inform the user without crashing.

              cboudjna@redhat.com Chmouel Boudjnah
              cboudjna@redhat.com Chmouel Boudjnah
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: