[arch] Curated MCP tool surface + semantic discovery via embedder/indexer (stop the brute-force) #15
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_agent#15
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Premise
The AI Assistant on herodemo can already query hero_biz and run Python — but the how is fragile and visibly ungainly. Live screenshot evidence from 2026-04-30:
That's ~30 lines of trial-and-error before the agent lands on a working call. It got there. But it brute-forced through socket paths and method names instead of having a tight, well-defined tool surface.
The pieces to fix this are already built:
This issue is the wiring. We're not building infra; we're connecting infra that exists.
Vision
Every domain in Hero OS exposes a curated MCP tool surface with semantic descriptions. The agent does not enumerate sockets, does not guess RPC method names, does not try Python as a fallback. It calls one tool per intent.
When the agent receives a prompt:
X-Hero-Context: <current>based on the user's active context. The agent doesn't have to think about it.Tool surface (per domain)
Initial set, ship in priority order:
Phase 1 — biz (the demo headliner)
Each tool has a structured description:
Phase 2 — calendar / projects / tasks
Phase 3 — content (books / files / photos / videos)
Phase 4 — OS-level (hero_route style)
Architecture
Tool registration mechanism
Each service ships a
mcp_tools.toml(or generates one from its OpenRPC schema) describing its curated tool surface. On startup, hero_indexer:mcp.discover(prompt)which returns the top-N matching tools.When a new service is added: drop in its
mcp_tools.toml, re-embed, agent sees it. No agent-side code change.Per-context routing
Every tool call carries the active context as a header. The user's "current context" is tracked by hero_os shell and pushed into the agent's session state. The agent's tool clients read this state and automatically inject the header. The LLM never sees or thinks about contexts — it sees
biz.list_contacts(limit=5)and the right context is wired in transparently.Acceptance
biz.add_contactexactly once.mcp.discoverRPC works against herodemo and returns ranked tools for arbitrary prompts.mcp_tools.tomland re-running the indexer makes it agent-visible without any hero_agent code change.Why this matters
This is the single biggest leverage point for AI Assistant quality alongside docs_hero coverage. Every prompt the agent answers depends on it picking the right action. With curated tools + semantic discovery, the agent looks like it understands the OS. Without them, it looks like a smart-but-confused junior who's never seen this codebase.
Cross-references
Signed-off-by: mik-tf
Read AND write — the closed loop is the point
The Phase 1-4 tool list above already mixes
*.list_*(read) and*.add_*/*.update_*(write) tools, but worth calling out the architectural loop explicitly because the read+write closed loop is what makes Hero OS feel intelligent, not just informative.The closed loop
When the agent writes to OSIS:
hero_osis_business(or whatever domain).This loop is the difference between an OS that has an AI bolted on and an OS that is operated by AI. Without write-side tools wired into the same indexer pipeline, the agent would have to remember its own writes in conversation context, which is fragile and resets every session. With the loop, the system itself is the memory.
Write-side requirements per domain
Every domain that exposes
*.list_*tools should also expose corresponding*.add_*/*.update_*/*.delete_*tools where the underlying RPC supports them:/api/files/{ctx}/{path}PUT/POST/DELETERead-only OK for: hero_books (libraries are external repos cloned via libraries.txt), hero_indexer (it indexes others), hero_embedder (it serves embeddings).
Per-context write isolation
Every write tool carries
X-Hero-Contextautomatically, exactly like reads. A write tobiz.add_contactwhile in Geomind context creates the contact in Geomind only. There's no "global add" — that would break the multi-tenancy story.Confirmation UX
Writes through the AI Assistant should always emit a brief confirmation that the user can read and rollback if needed:
The undo path uses the corresponding
*.delete_*tool. Tracked in the Ambient AI issue (#16) as part of the widget UX.Acceptance addition
Add to the existing acceptance:
biz.add_contactand the nextbiz.list_contactsquery reflects the new contact (closed loop verified end-to-end).Signed-off-by: mik-tf
Update from source-grounded read (session 52)
After reading
hero_router/crates/hero_router/src/server/mcp.rs+ the hero_agent MCP client, the original framing of this issue ("agent brute-forces socket paths; need to build curated MCP per-service") is largely already implemented and needs reshaping.What already exists:
hero_routerexposes every healthy service as an MCP server atPOST /mcp/{service_name}(mcp.rs:30). Tools auto-derived from the service's OpenRPC viaopenrpc_to_mcp_tools(mcp.rs:130).agent_runis auto-appended as a free Python-LLM tool on every service (mcp.rs:132-158) — natural-language → generated Python → executed against the service.claude mcp add --transport http <slug> <endpoint>snippet rendered in the per-service Quick Setup card (mcp.rs:591);.mcp.jsonsnippet atmcp.rs:607-631. Claude Code is a first-class consumer.~/hero/var/agent/mcp.jsonregistry with three transports (socket/url/command);urlis preferred and routes through hero_router. X-Hero-Context + claims propagate to outbound MCP HTTP viatokio::task_local!FORWARDED_HEADERS(hero_agent/.../mcp_client.rs:14-19, :683-690). The "brute-force socket guessing" framing is not what the code does.What's still real work, differently shaped:
descriptionfields would lift agent tool-selection precision.*.list,*.get,*.find) are safe to auto-expose. Write methods (*.set,*.delete) need confirmation gates / per-context allowlists before going live to the agent.Suggested next step: rescope this issue body to those four (or split into four). The "build MCP" framing is misleading — flag for closure of the build-MCP narrative, retain (1)–(4) as the live work.
Reconciliation memo for session 52:
memory/investigation_roadmap_reconciliation.md(private workspace).Status update — 2026-05-01 live audit on herodemo
After working through the AI Assistant slowness on the demo (trace: ~10 tool iterations / ~30-60s for "who are my contacts in hero_biz"), I audited the actual MCP wiring on the live VM. The infrastructure is more mature than this issue's original framing — the gap is narrower than building "curated tool surface from scratch."
What's already built
hero_router/crates/hero_router/src/server/mcp.rsimplements a complete MCP gateway:POST /mcp/{service_name}per service — full MCP protocol (initialize,tools/list,tools/call)mcp_tools.tomlneeded for the deriver path)/service/<svc>/mcprouter.mcp_logsRPC + UISo the registration mechanism this issue proposed (
mcp_tools.toml→ hero_indexer embedding → semantic discovery) is one valid path, but the simpler path (let router auto-derive from OpenRPC) is already live.Wiring gap — empirical, on herodemo VM 2026-05-01
Gap A —
hero_agentonly registers ONE MCP server. Live config:hero_biz / hero_calendar / hero_photos / hero_videos / hero_collab / hero_slides — none registered. The agent has no awareness those gateway endpoints exist, so it falls back to
search_hero_docs+shell_runto reverse-engineer everything. One file edit fixes this.Gap B — OpenRPC specs only expose
agent_run, not domain ops. Livetools/listcounts via the router gateway:agent_run)agent_run)agent_run)agent_run)So even if the agent had hero_biz registered as an MCP server, the only tool it would see is the generic
agent_runshell-exec sandbox — notbiz.list_contacts,biz.add_contactetc. Those domain operations are not in the OpenRPC schemas the gateway derives from.Revised scope — narrower than this issue's Phase 1-4 framing
The original Phase 1 spec listed 11 biz tools. To make the agent fast, we don't need a parallel
mcp_tools.tomlregistry or semantic discovery via embedder/indexer as a precondition. We need:(a) OpenRPC schema additions per service. For hero_biz: add
biz.list_contacts,biz.list_persons,biz.list_companies,biz.list_deals,biz.add_contactetc. to the OpenRPC spec. The router auto-exposes them as MCP tools the next timetools/listis called.(b) Extend
hero_agent'smcp.jsonto register all per-domain MCP endpoints. ~24 lines of JSON.That's it for the "agent answers in ~3-5s instead of ~30-60s" outcome. The semantic-discovery layer (embedder + indexer with
mcp.discover(prompt)) is still architecturally desirable for the "agent picks the right tool from 100+ tools" scaling story, but is not a blocker for the demo-level speedup. With ~30-50 tools across the active services, the LLM's native tool-selection (just from thetools/listJSON) handles selection well within Claude Sonnet's tool-use budget.Effort estimate (revised)
mcp.jsonfor the agent: ~30 min including a sanity test against each endpoint.tools/listreturns them: ~2-3 hr/domain. For Phase 1 (biz alone) that's a half-day. For all of biz + calendar + projects + tasks + photos + videos + books-search: 2-3 days of focused work.X-Hero-Contextheader: confirm the router's MCP proxy forwards the header end-to-end (it should already — it just proxies the JSON-RPC body); if not, ~1 hr to add header forwarding.So the demo-shipping version of this issue (agent picks domain tools cleanly) is ~3 days end-to-end if focused, not the multi-week build implied by the original framing. The semantic-discovery +
mcp_tools.tomlregistry can ship in a follow-up issue once the simpler path is proven.Adversarial caveats
tools/listpayload grows. Modern Claude handles this fine up to ~100 tools, but past that the LLM's selection accuracy drops. At which point semantic pre-filtering via embedder (the original plan) becomes load-bearing. So the original Phase 1-4 architecture is still the right end-state — just not the precondition for the demo win.X-Hero-Contextmust reach the underlying service's RPC handler. Verify the router's mcp_handler doesn't strip incoming headers (most proxies preserve them; quick check needed).agent_runfilter logic in router/mcp.rs explicitly excludes them, the auto-derive won't pick them up. Easy to verify by re-runningtools/listafter adding one method.Recommend keeping the existing Phase 1-4 plan as the long-term roadmap, but adding a "Phase 0 — minimal viable wiring" section above it covering the two gaps. That sequencing ships the demo speedup in ~3 days, then lets Phase 1+ inherit the working baseline.
Signed-off-by: mik-tf
Architectural principle to make load-bearing
This is the contract — not a nice-to-have. The router's MCP gateway already auto-derives
tools/listfrom each service's OpenRPC. So "RPC-complete" implies "MCP-complete" implies "agent-callable" with zero additional plumbing. Anything that bypasses RPC (e.g., direct webhook handlers, server-side templates that call internal Rust APIs but never expose them as RPC, magic shell scripts the UI invokes) is a contract violation that breaks the agent's ability to perform that action.This sharpens this issue's framing: the work is not "build a curated agent tool surface." It's "audit every service's user-facing capabilities and make sure each one has a corresponding RPC method." Once that's done, the agent's tool catalog is automatic.
Inventory of the gap (live observation, herodemo 2026-05-01)
Each row is "what the user can do in the OS shell" vs "what's currently exposed as RPC method".
biz.list_contactsin OpenRPCbooks.searchis thereos.switch_contextos.open_islandSo the majority of user-facing capabilities are currently behind UI handlers that don't have a public RPC method. The handlers exist (the UI calls them); they're just not in the OpenRPC schema, so the router's auto-deriver doesn't surface them, so the agent can't see them.
This is the single load-bearing fix. Once each handler has an RPC method declared, MCP coverage falls out for free.
Phase 0 — Wiring sweep (recommended sequencing)
Before tackling any "Phase 1-4" tool surface design (the original framing), do a flat audit pass that answers, per service, three questions:
The output is a per-service punch list. That punch list IS the work.
Phasing — per-service, NOT per-tool-category
The original Phase 1-4 framing organized by tool category (biz tools / calendar tools / content tools / OS tools). Reorganize by service, because each service's punch list is owned by one team / one repo, and the natural unit of work is "make hero_biz RPC-complete" not "ship 11 biz tools across multiple repos."
Suggested ordering (effort × demo value):
list_contacts,list_persons,list_companies,list_deals,list_opportunities,add_contact,add_person,add_company,add_deal,update_deal,summarize_pipeline.summarize_book,summarize_library.books.searchalready exists. Plus: confirmagent_runis the only thing intools/listand figure out why the OpenRPC-derived methods aren't surfacing (this is a router or codegen bug, not a books bug).list_events,create_event,update_event,delete_event.list_recent,list_album,get_metadata.list_channels,list_messages,post_message. Also fixes the FD leak observed 2026-05-01 if we audit the WS handlers in the same pass.transcribe(audio_blob),tts(text, voice?). Likely already mostly there since the wake-word loop uses internal versions.os.*endpoint set:os.switch_context,os.open_island,os.set_theme. This makes "open my photos" voice-controllable.Total: ~5 working days if all owners are aligned and OpenRPC codegen is unblocked. Can be parallelized across people if multiple services are owned independently.
Phase 1 — semantic discovery (still relevant, defer)
The original issue's
mcp.discover(prompt)via embedder + indexer becomes load-bearing oncetools/listexceeds ~50-100 tools (Claude's tool-selection accuracy degrades past that point). At ~5-10 tools/service × 8 services = 40-80 tools, we're at the ceiling. So the semantic-discovery layer is the next issue to ship, but only after Phase 0 lands.Defer the
mcp_tools.tomlregistry too — OpenRPC carries enough description metadata for now. Re-evaluate when we hit the tool-count ceiling.Acceptance for Phase 0
curl /mcp/<svc>tools/listreturns the full set for each service (not justagent_run).mcp.jsonregisters all per-domain MCP endpoints.photos.list_recentdirectly.Cross-references
Signed-off-by: mik-tf
Addendum — confirming the deriver is unconditional
Reviewed
hero_router/crates/hero_router/src/server/mcp.rs:409(fn openrpc_to_mcp_tools):No filter, no allowlist, no
mcp: trueannotation requirement. It iterates every entry in the OpenRPC spec'smethodsarray and emits one MCP tool per entry. The mapping is purely structural: name → tool name, summary/description → tool description, params → JSON Schema forinputSchema.So the live observation that
tools/listreturns onlyagent_runper service has exactly one explanation: the OpenRPC specs themselves only declareagent_run. The domain operations (biz.list_contacts,calendar.list_events,photos.list_recent, etc.) are missing from the schemas — likely because those handlers are served as ad-hoc HTTP UI routes from each*_uiserver, not as RPC methods on the OServer (*_server'srpc.sock).This sharpens the Phase 0 punch list: per service, the work is (a) move handlers from "plain HTTP UI route" to "RPC method on rpc.sock with an OpenRPC schema entry", or (b) if the handler is already on rpc.sock, just declare it in the OpenRPC schema. Once that's done, the router's auto-deriver picks it up the next time
tools/listis called — zero changes to hero_router needed.The contract is correct as built. Phase 0 is "fill the schemas to use the contract."
Signed-off-by: mik-tf
mik-tf referenced this issue from lhumina_code/hero_demo2026-05-02 03:28:52 +00:00
its not like this
see https://forge.ourworld.tf/lhumina_code/hero_router/src/branch/development/docs/agentic_calling.md
how we want it