[md]LLM harness
modules/example-harness · Stdio + HTTP · JVM / Native
A minimal LLM ↔ MCP agent. Reads a Claude-style .mcp.json, opens every
server it lists, hands those servers' tools to an OpenAI-compatible chat
endpoint, and runs an interactive REPL with streaming responses, slash
commands, and tool-calling.
This is the most complete client example in the repo — it exercises every
moving part: both transports, the McpClient API, server-initiated
sampling and elicitation callbacks, and notifications.
What it does
- Connects to every MCP server in your
.mcp.json— stdio (command) and HTTP (type: "http",url) entries are both supported. - Bridges every tool to OpenAI-style tool-calling. Tool names are
namespaced as
serverName__toolNameso collisions are unambiguous and the LLM picks the server explicitly. Servers that advertiseresourcesalso get synthetic<server>__list_resourcesand<server>__read_resourcetools so the LLM can browse and read MCP resources via the same channel. - Streams responses token-by-token. The default
chat/completionsendpoint is consumed via SSE; reasoning tokens (DeepSeek / GLM / OpenRouter conventions) stream alongside content tokens in a dimmed lane. - Handles server callbacks.
sampling/createMessageround-trips through the same LLM endpoint.elicitation/createprompts the user field-by-field on the terminal, with type coercion (integer,number,boolean). - Surfaces notifications. Server-initiated logs and list-changed events print dim alongside the chat output.
- Slash commands.
/help,/prompts(lists every connected server's prompts),/prompt <serverName__promptName> [k=v…](invokes a prompt and continues the chat from its messages).:q/:quitexits.
Configuration
The harness reads a Claude-style .mcp.json:
{
"mcpServers": {
"dice": {
"command": "sbt",
"args": ["exampleDiceJVM/run"]
},
"pomodoro": {
"type": "http",
"url": "http://localhost:25000/mcp"
}
}
}
Discriminator: a command key means stdio; otherwise (or type: "http")
means streamable HTTP. Headers can be passed as "headers": { ... } for
HTTP entries.
Build and run (JVM)
sbt 'exampleHarnessJVM/run --config .mcp.json --base-url https://api.openai.com/v1 --api-key sk-… --model gpt-4o-mini'
Any OpenAI-compatible endpoint works — set --base-url to your provider's
URL (Anthropic via OpenRouter, DeepSeek, GLM, a local Ollama, etc.) and
--model to a model id that endpoint understands.
Build and run (Scala Native)
# Requires clang/llvm and s2n-tls — `nix-shell` provides both.
sbt exampleHarnessNative/nativeLink
./modules/example-harness/native/target/scala-3.3.4/example-harness-out \
--config .mcp.json --base-url https://api.openai.com/v1 \
--api-key sk-… --model gpt-4o-mini
The native binary is single-file and starts in milliseconds, which makes it convenient as a long-running terminal companion.
What it demonstrates
The harness source under
modules/example-harness/shared/src/main/scala/net/andimiller/mcp/examples/harness/
is split into focused files worth reading in order:
| File | Shows |
|---|---|
Main.scala |
Wiring: load .mcp.json, build the LLM client, build the shared ClientHandler, open every server, collect tools and prompts, hand off to Repl.run. |
McpClients.scala |
One function per McpServerSpec that returns a Resource[F, McpClient[F]] — both StdioMcpClient.builder and StreamableHttpMcpClient.builder. |
ClientHandlers.scala |
The capability advertisement (sampling, elicitation with form) and dispatch by method name. |
SamplingHandler.scala |
sampling/createMessage → forward to OpenAiClient.chat, shape the response back into the MCP wire format. |
ElicitationHandler.scala |
elicitation/create → walk the schema's properties, prompt the terminal field-by-field, type-coerce, return an accept / cancel response. |
ToolBridge.scala |
Aggregate every server's tools into a single OpenAI-shaped tool list with namespaced names; route tool calls back to the right McpClient; synthesise list_resources / read_resource tools for servers that advertise resources. |
PromptBridge.scala |
Surface MCP prompts as /prompt … slash commands; convert PromptMessage content into OpenAI ChatMessages. |
Notifications.scala |
One background fiber per server that drains client.notifications and prints them dim. |
Repl.scala |
The chat loop: streaming output with separate "content" and "thinking" lanes, tool-call hops bounded by MaxToolHops, slash-command dispatch. |
OpenAiClient.scala / OpenAiTypes.scala |
A tiny OpenAI-compatible chat client (single POST or streaming SSE) plus the wire types it needs. |
Together they're a worked answer to "what does it take to plug an LLM
into an arbitrary set of MCP servers." Most of the protocol-level
plumbing is upstream in McpClient and ClientHandler — the harness is
mostly bridging code between MCP shapes and OpenAI shapes.