feat(mcp): defer MCP tool schemas to slash per-turn context cost #3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/deferred-mcp-tools"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
MCP tools were registered into the toolset and sent to the model on every
turn with full JSON schemas plus a boilerplate description prefix. With a few
servers that is thousands of tokens per turn, re-sent each step, occupying the
context window and pushing real content toward compaction.
Defer them instead (qwen-code / shrimp's idea): MCP tools are kept out of the
tool list and replaced by a compact, per-server index appended to the system
prompt. The model calls a new
tool_searchtool with a keyword to activate thetools a task needs; their schemas re-enter the prompt from the next step on.
can activate tools from inside a spawned tool-call task without re-locking the
toolset mutex held by the in-flight step.
tool_search entirely when nothing is deferred (no change for non-MCP users).
24-server cap, then a single summary line — O(1) prompt cost at fleet scale.
per-server grouping); the server-local name is still sent in tools/call.
nothing is deferred, so the cached system prefix stays stable.
Measured on three real MCP servers (github/everything/filesystem, 53 tools): the
per-turn tools[] payload drops from ~7,400 to ~295 tokens at idle (25x smaller,
96% less). Proven by three test layers:
captured schemas, asserts >5x smaller (regression guard).
tools[] + system prompt the agent transmits through kosong::step; gated behind
KIMI_LIVE_MCP_PROOF so the default suite stays offline.
kosong::convert_tool is made pub so tests measure the real wire encoding.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
some other ideas could be
Pull request closed