-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Product / Portfolio Work
-
5
-
False
-
-
False
-
Not Selected
-
-
-
OAPE Sprint 284, OAPE Sprint 285
-
2
Recently, we added the openshift toolset to the downstream openshift-mcp-server https://github.com/openshift/openshift-mcp-server/pull/51. This toolset contains the plan_mustgather mcp ServerPrompt to aid LLMs in planning pod, rbac, etc. spec that helps a user collect a must-gather on a cluster. It also makes the agent's behaviour deterministic by always ensuring the same spec(s) are generated based upon user queries that get converted into arguments for the prompt.
For any new toolset to be shipped to our customers, we require evaluation rules on various agents like Claude Code, Codex, etc. test that the developed prompt does the correct task when user tries to perform gather collection and verify the results of the process.
https://github.com/mcpchecker/mcpchecker is the framework used to perform such evals.
Acceptance criteria: add new eval rules for user queries on various types of must-gather collection into https://github.com/openshift/openshift-mcp-server/tree/main/evals, verify the results by running on atleast 1 agent like Claude Code.