Kimi assistant: trim the MCP tool surface so chat actions stay fast

mik-tf commented

2026-06-04 15:32:23 +00:00

Owner

When the Kimi assistant is wired to planner, whiteboard and slides over MCP, it loads every tool from all three services at startup (about 91 + 102 + 152, roughly 345 tools). Reads (list and summarize) are reliable. Write actions that take a few tool calls are slow on a cold start and can take over a minute or time out, because every turn carries the full tool surface and the model reasons over all of it.

In a live chat the session stays warm, so the first message pays the load cost and later messages are fast. The slowness mainly shows up on the first action or on scripted single shot calls.

Proposal: curate or subset the MCP tools exposed to Kimi (for example a small high value set per app, or lazy tool loading), so a single action stays fast even on a cold start, while keeping all three apps reachable.

Related deploy readiness findings while wiring this up (so they are tracked):

On at least one admin node the deployer writes the Kimi config with the wrong provider type, so a freshly provisioned tester opens Kimi but cannot answer until the type is corrected. The corrected build needs to be the one deployed everywhere.
The deployer only registers planner and whiteboard in the Kimi MCP config. Slides should be added too so a fresh tester gets all three without manual edits.
Kimi reads its model API key from the process environment. A valid key must be set at install so a fresh tester can chat out of the box.

Verified on the live demo tester: Kimi connected on the moonshotai/kimi-k2.6 model, read planner, whiteboard and slides over MCP and returned real data. Write actions succeed but are slow on a cold start as described above.

When the Kimi assistant is wired to planner, whiteboard and slides over MCP, it loads every tool from all three services at startup (about 91 + 102 + 152, roughly 345 tools). Reads (list and summarize) are reliable. Write actions that take a few tool calls are slow on a cold start and can take over a minute or time out, because every turn carries the full tool surface and the model reasons over all of it. In a live chat the session stays warm, so the first message pays the load cost and later messages are fast. The slowness mainly shows up on the first action or on scripted single shot calls. Proposal: curate or subset the MCP tools exposed to Kimi (for example a small high value set per app, or lazy tool loading), so a single action stays fast even on a cold start, while keeping all three apps reachable. Related deploy readiness findings while wiring this up (so they are tracked): 1. On at least one admin node the deployer writes the Kimi config with the wrong provider type, so a freshly provisioned tester opens Kimi but cannot answer until the type is corrected. The corrected build needs to be the one deployed everywhere. 2. The deployer only registers planner and whiteboard in the Kimi MCP config. Slides should be added too so a fresh tester gets all three without manual edits. 3. Kimi reads its model API key from the process environment. A valid key must be set at install so a fresh tester can chat out of the box. Verified on the live demo tester: Kimi connected on the moonshotai/kimi-k2.6 model, read planner, whiteboard and slides over MCP and returned real data. Write actions succeed but are slow on a cold start as described above.

mik-tf commented

2026-06-04 19:05:25 +00:00

Author

Owner

The Kimi assistant change that implements this is in hero_kimi_rust (lhumina_code/hero_kimi_rust#3): instead of sending every connected tool with its full description on every turn, the tools are kept out of the list and the model pulls only the ones it needs through a search tool, which removes the large per turn cost that made the first action slow. It has merged into that repo's main branch (commit b9210c9), so deploying that build to the tester should make assistant actions fast and close this. We will verify on the tester after deploying.

Signed-by: mik-tf mik-tf@noreply.invalid

The Kimi assistant change that implements this is in hero_kimi_rust (https://forge.ourworld.tf/lhumina_code/hero_kimi_rust/pulls/3): instead of sending every connected tool with its full description on every turn, the tools are kept out of the list and the model pulls only the ones it needs through a search tool, which removes the large per turn cost that made the first action slow. It has merged into that repo's main branch (commit b9210c9), so deploying that build to the tester should make assistant actions fast and close this. We will verify on the tester after deploying. Signed-by: mik-tf <mik-tf@noreply.invalid>

mik-tf referenced this issue

2026-06-04 19:17:33 +00:00

[META] Hero OS sandbox demo, functional readiness: onboarding pipeline + per-app verification #239

mik-tf referenced this issue

2026-06-05 16:58:49 +00:00

[META] Hero OS sandbox demo, functional readiness: onboarding pipeline + per-app verification #239

Rows
Columns

Kimi assistant: trim the MCP tool surface so chat actions stay fast #249