lhumina_code/home

Fork 0

[nu-demo] CANONICAL — Intelligence pipeline documented end-to-end: hero_agent ↔ aibroker ↔ embedder ↔ hero_books ↔ docs_hero #159

New issue

Closed

opened 2026-04-24 02:48:10 +00:00 by mik-tf · 1 comment

mik-tf commented

2026-04-24 02:48:10 +00:00

Owner

Scope

This issue consolidates everything we learned during the heronu demo build about how Hero OS's AI pipeline actually works in production. It's the single place a dev/ops person new to the stack should land before touching any of: triage classifier, tool routing, LLM dispatch, MCP wiring, library indexing, or semantic tool selection.

The equivalent user-facing doc has been committed as docs_hero/collections/hero_os_guide/ai_pipeline.md on the development_mik_nu_demo branch — so the AI Assistant can cite its own architecture via search_hero_docs once indexed. Closed loop.

Short version

┌──────────────────────────────────────────────────────────────────────────┐
│ Browser → POST /hero_agent/rpc method=agent.chat                        │
│                                                                          │
│   hero_agent_server                                                      │
│     1. triage         (Knowledge vs Tools)                               │
│     2. tool selection  pattern | semantic | hybrid (← via embedder)     │
│     3. LLM call        via hero_aibroker → Claude / GPT / Groq / OR     │
│     4. tool exec       built-in (58) OR MCP via hero_router /mcp/{svc}  │
│     5. answer → browser                                                  │
│                                                                          │
│   Grounding subloop: search_hero_docs →                                  │
│     hero_books.search.query →                                            │
│     hero_embedder.embed + cosine + rerank →                              │
│     docs_hero/docs_geomind/docs_mycelium/docs_owh Q&A corpus             │
└──────────────────────────────────────────────────────────────────────────┘

1. Triage — the hidden branch point

Source: hero_agent/crates/hero_agent/src/agent.rs — triage classifier decides whether the LLM gets called WITH or WITHOUT tools.

triage = Knowledge: LLM receives only system prompt + message. No tools. No search_hero_docs. The model answers from training priors.
triage = Tools: LLM receives system prompt + message + tool list (after selection in §2). Tool calls are possible; grounding is possible.

Demo bug (see #152): "What is Hero OS?" triages as Knowledge, so grounding never happens even when the rest of the stack is perfectly wired. The MANDATORY system prompt directive in prompt.rs is only half of the fix — the triage classifier has to route hero-keyword messages to Tools.

2. Tool selection — the key insight you MUST know

Source: hero_agent/crates/hero_agent/src/semantic_router.rs + config.rs.

Three routing modes, switchable via HERO_AGENT_ROUTING_MODE:

Mode	Behavior	`env` value
Pattern (default!)	Ships ALL registered tools to the LLM	`pattern`
Semantic	Embeds user query, cosine-ranks tools, sends top-K	`semantic`
Hybrid	Semantic + `always_include` allowlist	`hybrid`

When mcp.json wires 7 MCP services (hero_osis, hero_books, hero_aibroker, hero_foundry, hero_indexer, hero_embedder, hero_proc), total tools ≈ 58 built-in + 107 MCP = 165. Pattern mode ships all 165, which triggers:

OpenAI: Invalid 'tools': array too long. Expected maximum length 128, got 165 (hard limit, documented).
Anthropic Claude: tools.60.custom.name: String should match pattern ^[a-zA-Z0-9_-]{1,128}$ — MCP tool names like hero_osis.contact.list contain dots, which violate the regex.
Gemini-2.5-flash: Duplicate function declaration found: agent_run — adapter mangling collision.

Every single LLM target failed because of this. See #153.

Why semantic routing is the architectural answer

On hero_agent startup, SemanticRouter:

Walks built-in tools from hero_agent_tools + MCP-discovered tools from mcp_client.

For each tool, emits a text like:

Tool: search_hero_docs
Description: Search Hero OS documentation via hero_books. Returns relevant doc snippets...

SHA-256 hashes the full tool index. If it matches the previously persisted hash (stored in embedder KVS under tool_index_hash), skip re-embedding.
Else: batch-embed via hero_embedder (Q1 Fast, 384-dim BGE-small) and upsert into namespace hero_agent_tools.

At runtime, each user message:

Is embedded identically (same model, same namespace).
Cosine-searched against the tool index, returning top-K with scores.
Threshold filter (default 0.3) drops low-relevance matches.
The survivors — typically 5-10 — are the only tools shipped to the LLM.

Result: tool count stays well under 128, only the most relevant tools get offered, tool-selection quality goes up (LLM isn't distracted by 150 irrelevant options).

The env vars that matter

Var	Default	Recommended
`HERO_AGENT_ROUTING_MODE`	`pattern`	`hybrid`
`HERO_AGENT_EMBEDDER_SOCKET`	`$HOME/hero/var/sockets/hero_embedder/rpc.sock`	unchanged
`HERO_AGENT_SEMANTIC_TOP_K`	`5`	`10`
`HERO_AGENT_SEMANTIC_THRESHOLD`	`0.3`	`0.25`

Applied on heronu via hero_proc action set /home/driver/hero_agent_action.json.

Still-remaining gap

Even with semantic routing, if any of the top-K tools is an MCP tool with dots in its name, Anthropic still rejects. Proper fix: sanitize . → _ when emitting tool names to the LLM, preserve original name for dispatch. 5 lines in mcp_client.rs + tool_router.rs. Should be bundled with #153.

3. LLM dispatch — always via hero_aibroker

Source: hero_agent/crates/hero_agent/src/llm_client.rs.

hero_agent never calls a provider API directly in production. All traffic routes through hero_aibroker's OpenAI-compatible REST at /v1/chat/completions.

Why this matters: provider keys change, models get deprecated, rate limits need centralizing. aibroker owns all of that. One config file (modelsconfig.yml) controls the provider-routing table, and every Hero service (agent, voice STT, TTS, future embeddings fallback) sees the same interface.

Provider selection matters for demo quality:

Claude (Haiku 4.5, Sonnet via anthropic): honors tool_choice hints, MANDATORY system prompt directives — best for grounded answers.
Gemini-Flash via OpenRouter: ignores tool-use preamble for questions it thinks it knows. If aibroker is down and hero_agent falls back to OpenRouter-direct with Gemini as default, answers hallucinate.
GPT-4o-mini: good tool-use but 128-tool cap.

The hero_zero Docker precedent always sets AIBROKER_API_ENDPOINT — this was the ONE env var most missing from the nu-shell service_agent.nu (which is itself missing — #135).

4. Tool execution — built-in vs MCP

Built-in (hero_agent_tools crate, ~58 tools)

Names are regex-clean underscored. Dispatch is local — no network hop. Examples:

search_hero_docs — Unix socket to hero_books/rpc.sock, method search.query
list_files / read_file / write_file — WebDAV via hero_foundry/rpc.sock
osis_contact_list / osis_event_create / ... — per-domain OSIS sockets
run_command — shell, sandbox-gated via SafetyLevel
python_exec — ephemeral Python via uv
browse_url — headless Chromium via hero_browser service

MCP (via hero_router /mcp/{service_name})

Wired through ~/hero/var/agent/mcp.json:

{
  "mcpServers": {
    "hero_books": {
      "url": "http://${HERO_ROUTER_HOST:-10.1.2.2}:${HERO_ROUTER_PORT:-9988}/mcp/hero_books",
      "transport": "http"
    },
    "hero_osis": { "url": "http://.../mcp/hero_osis", "transport": "http" }
  }
}

hero_router exposes POST /mcp/{service} for every service whose rpc.sock serves an OpenRPC spec. On startup, hero_agent's mcp_client does MCP initialize, reads the tool list (derived from OpenRPC), and registers each tool.

Tool names from MCP: <namespace>.<verb> (e.g. collections.list, search.query). Valid JSON-RPC method names, invalid Anthropic function names. See §2.

5. The grounding subloop — `search_hero_docs`

The critical tool for Hero-related questions. Full trace:

1. LLM emits: call search_hero_docs({"query": "what is hero os", "top_k": 5})
                               │
2. hero_agent_tools::search_hero_docs.execute()
                               │
3. POST to hero_books/rpc.sock, method=search.query
                               │
4. hero_books_server dispatches:
   a. hero_embedder.embed(query, Q1 Fast) → 384-dim vector
      └─ hero_embedderd TCP 127.0.0.1:8092 → ONNX BGE-small
   b. cosine search in namespace="hero" (or param-filtered)
      └─ redb-backed HNSW index, 163 docs in hero_os_guide
   c. hero_embedder.rerank(query, top-20 candidates, Q1)
      └─ cross-encoder reranker for final relevance scores
                               │
5. Return top-K Q&A pairs with { question, answer, page, score, content_hash }
                               │
6. hero_agent injects as tool_result, LLM composes final grounded answer
   citing (book, page) — e.g. "(hero_os_guide, overview)"

Per-library search gap: the current tool hardcodes namespace="hero". To search geomind/mycelium/ourworld, need either (a) optional namespace/library param on the tool, or (b) a separate tool per library. The hero_books search.query RPC does take a namespace param; the agent-side tool just doesn't expose it.

6. Library → embedder indexing (how docs get into the store)

On hero_books startup:

~/hero/var/books/libraries.txt  — one `name URL` per line
  │
  ├─ hero      https://forge.ourworld.tf/lhumina_code/docs_hero
  ├─ geomind   https://forge.ourworld.tf/geomind/docs_geomind
  ├─ mycelium  https://forge.ourworld.tf/coopcloud/docs_mycelium
  └─ ourworld  https://forge.ourworld.tf/ourworld/docs_owh
                               │
   For each entry:                                                        
     git clone/pull → /home/driver/code/docs_<name>/
                               │
   Scan collections/<coll>/*.md                                           
                               │
     For each page:                                                       
       ├─ Check .ai/<page>.toml (pre-shipped Q&A cache)
       │    ├─ content_hash matches? → reuse, no LLM call
       │    └─ mismatch? → re-extract via aibroker (slow!)
       │
       ├─ Upsert Q&A pairs → hero_embedder namespace=<library>
       └─ (if .vectors.bin exists) reuse pre-computed embeddings

See #158 — pre-shipped .ai/ cache is currently bypassed because content_hash is computed on the exported .md but the stored hash is from the source (pre-export) representation.

7. Service topology matrix

Component	Role	Socket / TCP	Rebuild command
hero_agent	AI orchestrator	rpc.sock	`cargo build -p hero_agent_server --release`
hero_aibroker	LLM router	rest.sock, ui.sock, rpc.sock	(binary)
hero_embedder	Vector DB + reranker	rpc.sock + daemon TCP:8092	(binary, plus ONNX runtime install)
hero_books	Doc indexer + search	rpc.sock, ui.sock, web_admin.sock	`cargo build -p hero_books_server`
hero_router	TCP ingress :9988	TCP only	`cargo build -p hero_router`
hero_foundry	WebDAV + git forge	rpc.sock (serves /api/files/{ctx}/{path})	(binary)
hero_osis	17 per-domain data	hero_osis_/rpc.sock each	(17 binaries)

8. Diagnostic commands

# Is the routing mode active?
cat /proc/$(pgrep -f hero_agent_server | head -1)/environ | tr '\0' '\n' | grep HERO_AGENT_ROUTING

# Are tools indexed into the embedder?
curl -s --unix-socket ~/hero/var/sockets/hero_embedder/rpc.sock -X POST http://localhost/rpc \
  -d '{"jsonrpc":"2.0","id":1,"method":"namespace.list","params":{}}'
# Look for `hero_agent_tools` namespace with >0 docs

# End-to-end grounded query:
curl -s --unix-socket ~/hero/var/sockets/hero_agent/rpc.sock -X POST http://localhost/rpc \
  -H 'Content-Type: application/json' \
  -H 'X-Hero-Context: default' \
  -d '{"jsonrpc":"2.0","id":1,"method":"agent.chat","params":{"message":"Call search_hero_docs with query=Hero OS pipeline and top_k=3, then summarize","model":"claude-haiku-4.5"}}'
# Expect response citing (hero_os_guide, ai_pipeline) or similar

# MCP tool discovery:
curl -s -X POST 'http://10.1.2.2:9988/mcp/hero_books' \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

9. Cross-references — everything this issue touches

#148 — nu-demo architecture index (topology, deploy flow)
#149 — prompt.rs directive strengthening + rebuild verification
#150 — tool_choice=required support (needs turn-1-only gating; pragmatic patch caused infinite loops)
#152 — triage classifier routes hero questions to Knowledge path (the hallucination root cause)
#153 — tools payload sanitization (count > 128, dots-in-names regex, duplicates)
#156 — Photos seed storage_path leading slash
#157 — Books island URL double-slash + server-side slash-collapse middleware demo fix
#158 — hero_books re-runs LLM Q&A extraction despite pre-shipped .ai/<page>.toml cache
#125 — hero_router X-Hero-Context header clobbering
#135 — service_agent.nu missing entirely
#136 — OPENROUTER_API_KEYS plural name
#145 — hero_embedder blocking reqwest in tokio async

10. The killer demo — what it should look like

With all the above working (hybrid routing on, Claude-backed aibroker, hero_books libraries indexed, triage override for hero-keywords, tool-name sanitizer):

User: "What is Hero OS?" → triage→Tools, semantic pulls search_hero_docs, Claude calls it, response cites (hero_os_guide, overview/quickstart/ai_pipeline) with ~1s latency.
User: "How does the AI Assistant work?" → search_hero_docs hits ai_pipeline.md (this very doc), response explains the pipeline from the doc's own text. Closed loop: the AI reads its own documentation.
User: "List files in my geomind workspace" → triage→Tools, semantic pulls list_files, response shows webdav contents.
User: "Summarize the mycelium hosting plan" → triage→Tools, semantic pulls a namespace-aware search tool (once https://forge.ourworld.tf/lhumina_code/home/issues/XXX lands), response grounds in docs_mycelium.

11. What's on `development_mik_nu_demo`

Docs_hero branch development_mik_nu_demo (NOT pushed) — commits:

2c88aa2 — refreshed architecture.md + services.md + overview.md + quickstart.md for the nu-shell / hero_proc / hero_agent era
6deb8df — added ai_pipeline.md (this content in user-doc form)

heronu VM state:

prompt.rs MANDATORY directive compiled into hero_agent_server binary (strings count = 1 for MANDATORY / Hero Vired / sovereign digital)
HERO_AGENT_ROUTING_MODE=hybrid + top_k=10 + threshold=0.25 applied via hero_proc action set
AIBROKER_API_ENDPOINT + HERO_AGENT_AIBROKER_MODELS + OSIS_URL/OSIS_CONTEXT in agent env
mcp.json trimmed to hero_books only (pending name-sanitizer patch before widening)
4 libraries cloned + indexed (hero=163, geomind=1733, mycelium/ourworld in progress)
WASM release + brotli sidecars (211MB → 1.8MB — #140)
30 photo/video/song seed records rewritten to strip leading slash (#156)
49 .docx files seeded into Office archipelago (default + geomind + incubaid + root + threefold contexts)

Signed-off-by: mik-tf

## Scope This issue consolidates everything we learned during the heronu demo build about how Hero OS's AI pipeline actually works in production. It's the single place a dev/ops person new to the stack should land before touching any of: triage classifier, tool routing, LLM dispatch, MCP wiring, library indexing, or semantic tool selection. The equivalent user-facing doc has been committed as [`docs_hero/collections/hero_os_guide/ai_pipeline.md`](https://forge.ourworld.tf/lhumina_code/docs_hero/src/branch/development_mik_nu_demo/collections/hero_os_guide/ai_pipeline.md) on the `development_mik_nu_demo` branch — so the AI Assistant can cite its own architecture via `search_hero_docs` once indexed. **Closed loop.** ## Short version ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ Browser → POST /hero_agent/rpc method=agent.chat │ │ │ │ hero_agent_server │ │ 1. triage (Knowledge vs Tools) │ │ 2. tool selection pattern | semantic | hybrid (← via embedder) │ │ 3. LLM call via hero_aibroker → Claude / GPT / Groq / OR │ │ 4. tool exec built-in (58) OR MCP via hero_router /mcp/{svc} │ │ 5. answer → browser │ │ │ │ Grounding subloop: search_hero_docs → │ │ hero_books.search.query → │ │ hero_embedder.embed + cosine + rerank → │ │ docs_hero/docs_geomind/docs_mycelium/docs_owh Q&A corpus │ └──────────────────────────────────────────────────────────────────────────┘ ``` ## 1. Triage — the hidden branch point Source: `hero_agent/crates/hero_agent/src/agent.rs` — triage classifier decides whether the LLM gets called WITH or WITHOUT tools. - `triage = Knowledge`: LLM receives only system prompt + message. No tools. No `search_hero_docs`. The model answers from training priors. - `triage = Tools`: LLM receives system prompt + message + tool list (after selection in §2). Tool calls are possible; grounding is possible. **Demo bug** (see https://forge.ourworld.tf/lhumina_code/home/issues/152): "What is Hero OS?" triages as Knowledge, so grounding never happens even when the rest of the stack is perfectly wired. The MANDATORY system prompt directive in `prompt.rs` is only half of the fix — the triage classifier has to route hero-keyword messages to Tools. ## 2. Tool selection — the key insight you MUST know Source: `hero_agent/crates/hero_agent/src/semantic_router.rs` + `config.rs`. Three routing modes, switchable via `HERO_AGENT_ROUTING_MODE`: | Mode | Behavior | `env` value | |---|---|---| | Pattern (default!) | Ships ALL registered tools to the LLM | `pattern` | | Semantic | Embeds user query, cosine-ranks tools, sends top-K | `semantic` | | Hybrid | Semantic + `always_include` allowlist | `hybrid` | When `mcp.json` wires 7 MCP services (hero_osis, hero_books, hero_aibroker, hero_foundry, hero_indexer, hero_embedder, hero_proc), total tools ≈ 58 built-in + 107 MCP = **165**. Pattern mode ships all 165, which triggers: - **OpenAI**: `Invalid 'tools': array too long. Expected maximum length 128, got 165` (hard limit, documented). - **Anthropic Claude**: `tools.60.custom.name: String should match pattern ^[a-zA-Z0-9_-]{1,128}$` — MCP tool names like `hero_osis.contact.list` contain dots, which violate the regex. - **Gemini-2.5-flash**: `Duplicate function declaration found: agent_run` — adapter mangling collision. **Every single LLM target failed** because of this. See https://forge.ourworld.tf/lhumina_code/home/issues/153. ### Why semantic routing is the architectural answer On hero_agent startup, `SemanticRouter`: 1. Walks built-in tools from `hero_agent_tools` + MCP-discovered tools from `mcp_client`. 2. For each tool, emits a text like: ``` Tool: search_hero_docs Description: Search Hero OS documentation via hero_books. Returns relevant doc snippets... ``` 3. SHA-256 hashes the full tool index. If it matches the previously persisted hash (stored in embedder KVS under `tool_index_hash`), skip re-embedding. 4. Else: batch-embed via `hero_embedder` (Q1 Fast, 384-dim BGE-small) and upsert into namespace `hero_agent_tools`. At runtime, each user message: 1. Is embedded identically (same model, same namespace). 2. Cosine-searched against the tool index, returning top-K with scores. 3. Threshold filter (default `0.3`) drops low-relevance matches. 4. The survivors — typically 5-10 — are the only tools shipped to the LLM. **Result:** tool count stays well under 128, only the most relevant tools get offered, tool-selection quality goes up (LLM isn't distracted by 150 irrelevant options). ### The env vars that matter | Var | Default | Recommended | |---|---|---| | `HERO_AGENT_ROUTING_MODE` | `pattern` | `hybrid` | | `HERO_AGENT_EMBEDDER_SOCKET` | `$HOME/hero/var/sockets/hero_embedder/rpc.sock` | unchanged | | `HERO_AGENT_SEMANTIC_TOP_K` | `5` | `10` | | `HERO_AGENT_SEMANTIC_THRESHOLD` | `0.3` | `0.25` | Applied on heronu via `hero_proc action set /home/driver/hero_agent_action.json`. ### Still-remaining gap Even with semantic routing, if any of the top-K tools is an MCP tool with dots in its name, Anthropic still rejects. Proper fix: sanitize `.` → `_` when emitting tool names to the LLM, preserve original name for dispatch. 5 lines in `mcp_client.rs` + `tool_router.rs`. Should be bundled with https://forge.ourworld.tf/lhumina_code/home/issues/153. ## 3. LLM dispatch — always via hero_aibroker Source: `hero_agent/crates/hero_agent/src/llm_client.rs`. hero_agent **never calls a provider API directly in production**. All traffic routes through `hero_aibroker`'s OpenAI-compatible REST at `/v1/chat/completions`. Why this matters: provider keys change, models get deprecated, rate limits need centralizing. aibroker owns all of that. One config file (`modelsconfig.yml`) controls the provider-routing table, and every Hero service (agent, voice STT, TTS, future embeddings fallback) sees the same interface. **Provider selection matters for demo quality:** - **Claude (Haiku 4.5, Sonnet via anthropic)**: honors `tool_choice` hints, MANDATORY system prompt directives — best for grounded answers. - **Gemini-Flash via OpenRouter**: ignores tool-use preamble for questions it thinks it knows. If aibroker is down and hero_agent falls back to OpenRouter-direct with Gemini as default, answers hallucinate. - **GPT-4o-mini**: good tool-use but 128-tool cap. The hero_zero Docker precedent always sets `AIBROKER_API_ENDPOINT` — this was the ONE env var most missing from the nu-shell service_agent.nu (which is itself missing — https://forge.ourworld.tf/lhumina_code/home/issues/135). ## 4. Tool execution — built-in vs MCP ### Built-in (hero_agent_tools crate, ~58 tools) Names are regex-clean underscored. Dispatch is local — no network hop. Examples: - `search_hero_docs` — Unix socket to `hero_books/rpc.sock`, method `search.query` - `list_files` / `read_file` / `write_file` — WebDAV via `hero_foundry/rpc.sock` - `osis_contact_list` / `osis_event_create` / ... — per-domain OSIS sockets - `run_command` — shell, sandbox-gated via `SafetyLevel` - `python_exec` — ephemeral Python via `uv` - `browse_url` — headless Chromium via hero_browser service ### MCP (via hero_router /mcp/{service_name}) Wired through `~/hero/var/agent/mcp.json`: ```json { "mcpServers": { "hero_books": { "url": "http://${HERO_ROUTER_HOST:-10.1.2.2}:${HERO_ROUTER_PORT:-9988}/mcp/hero_books", "transport": "http" }, "hero_osis": { "url": "http://.../mcp/hero_osis", "transport": "http" } } } ``` hero_router exposes `POST /mcp/{service}` for every service whose `rpc.sock` serves an OpenRPC spec. On startup, hero_agent's `mcp_client` does MCP `initialize`, reads the tool list (derived from OpenRPC), and registers each tool. Tool names from MCP: `<namespace>.<verb>` (e.g. `collections.list`, `search.query`). Valid JSON-RPC method names, invalid Anthropic function names. See §2. ## 5. The grounding subloop — `search_hero_docs` The critical tool for Hero-related questions. Full trace: ``` 1. LLM emits: call search_hero_docs({"query": "what is hero os", "top_k": 5}) │ 2. hero_agent_tools::search_hero_docs.execute() │ 3. POST to hero_books/rpc.sock, method=search.query │ 4. hero_books_server dispatches: a. hero_embedder.embed(query, Q1 Fast) → 384-dim vector └─ hero_embedderd TCP 127.0.0.1:8092 → ONNX BGE-small b. cosine search in namespace="hero" (or param-filtered) └─ redb-backed HNSW index, 163 docs in hero_os_guide c. hero_embedder.rerank(query, top-20 candidates, Q1) └─ cross-encoder reranker for final relevance scores │ 5. Return top-K Q&A pairs with { question, answer, page, score, content_hash } │ 6. hero_agent injects as tool_result, LLM composes final grounded answer citing (book, page) — e.g. "(hero_os_guide, overview)" ``` **Per-library search gap**: the current tool hardcodes `namespace="hero"`. To search `geomind`/`mycelium`/`ourworld`, need either (a) optional `namespace`/`library` param on the tool, or (b) a separate tool per library. The hero_books `search.query` RPC does take a namespace param; the agent-side tool just doesn't expose it. ## 6. Library → embedder indexing (how docs get into the store) On hero_books startup: ``` ~/hero/var/books/libraries.txt — one `name URL` per line │ ├─ hero https://forge.ourworld.tf/lhumina_code/docs_hero ├─ geomind https://forge.ourworld.tf/geomind/docs_geomind ├─ mycelium https://forge.ourworld.tf/coopcloud/docs_mycelium └─ ourworld https://forge.ourworld.tf/ourworld/docs_owh │ For each entry: git clone/pull → /home/driver/code/docs_<name>/ │ Scan collections/<coll>/*.md │ For each page: ├─ Check .ai/<page>.toml (pre-shipped Q&A cache) │ ├─ content_hash matches? → reuse, no LLM call │ └─ mismatch? → re-extract via aibroker (slow!) │ ├─ Upsert Q&A pairs → hero_embedder namespace=<library> └─ (if .vectors.bin exists) reuse pre-computed embeddings ``` See https://forge.ourworld.tf/lhumina_code/home/issues/158 — pre-shipped `.ai/` cache is currently bypassed because `content_hash` is computed on the exported `.md` but the stored hash is from the source (pre-export) representation. ## 7. Service topology matrix | Component | Role | Socket / TCP | Rebuild command | |---|---|---|---| | hero_agent | AI orchestrator | rpc.sock | `cargo build -p hero_agent_server --release` | | hero_aibroker | LLM router | rest.sock, ui.sock, rpc.sock | (binary) | | hero_embedder | Vector DB + reranker | rpc.sock + daemon TCP:8092 | (binary, plus ONNX runtime install) | | hero_books | Doc indexer + search | rpc.sock, ui.sock, web_admin.sock | `cargo build -p hero_books_server` | | hero_router | TCP ingress :9988 | TCP only | `cargo build -p hero_router` | | hero_foundry | WebDAV + git forge | rpc.sock (serves /api/files/{ctx}/{path}) | (binary) | | hero_osis | 17 per-domain data | hero_osis_<domain>/rpc.sock each | (17 binaries) | ## 8. Diagnostic commands ```bash # Is the routing mode active? cat /proc/$(pgrep -f hero_agent_server | head -1)/environ | tr '\0' '\n' | grep HERO_AGENT_ROUTING # Are tools indexed into the embedder? curl -s --unix-socket ~/hero/var/sockets/hero_embedder/rpc.sock -X POST http://localhost/rpc \ -d '{"jsonrpc":"2.0","id":1,"method":"namespace.list","params":{}}' # Look for `hero_agent_tools` namespace with >0 docs # End-to-end grounded query: curl -s --unix-socket ~/hero/var/sockets/hero_agent/rpc.sock -X POST http://localhost/rpc \ -H 'Content-Type: application/json' \ -H 'X-Hero-Context: default' \ -d '{"jsonrpc":"2.0","id":1,"method":"agent.chat","params":{"message":"Call search_hero_docs with query=Hero OS pipeline and top_k=3, then summarize","model":"claude-haiku-4.5"}}' # Expect response citing (hero_os_guide, ai_pipeline) or similar # MCP tool discovery: curl -s -X POST 'http://10.1.2.2:9988/mcp/hero_books' \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' ``` ## 9. Cross-references — everything this issue touches - https://forge.ourworld.tf/lhumina_code/home/issues/148 — nu-demo architecture index (topology, deploy flow) - https://forge.ourworld.tf/lhumina_code/home/issues/149 — prompt.rs directive strengthening + rebuild verification - https://forge.ourworld.tf/lhumina_code/home/issues/150 — tool_choice=required support (needs turn-1-only gating; pragmatic patch caused infinite loops) - https://forge.ourworld.tf/lhumina_code/home/issues/152 — **triage classifier routes hero questions to Knowledge path (the hallucination root cause)** - https://forge.ourworld.tf/lhumina_code/home/issues/153 — **tools payload sanitization (count > 128, dots-in-names regex, duplicates)** - https://forge.ourworld.tf/lhumina_code/home/issues/156 — Photos seed storage_path leading slash - https://forge.ourworld.tf/lhumina_code/home/issues/157 — Books island URL double-slash + server-side slash-collapse middleware demo fix - https://forge.ourworld.tf/lhumina_code/home/issues/158 — **hero_books re-runs LLM Q&A extraction despite pre-shipped `.ai/<page>.toml` cache** - https://forge.ourworld.tf/lhumina_code/home/issues/125 — hero_router X-Hero-Context header clobbering - https://forge.ourworld.tf/lhumina_code/home/issues/135 — service_agent.nu missing entirely - https://forge.ourworld.tf/lhumina_code/home/issues/136 — OPENROUTER_API_KEYS plural name - https://forge.ourworld.tf/lhumina_code/home/issues/145 — hero_embedder blocking reqwest in tokio async ## 10. The killer demo — what it should look like With all the above working (hybrid routing on, Claude-backed aibroker, hero_books libraries indexed, triage override for hero-keywords, tool-name sanitizer): - User: "What is Hero OS?" → triage→Tools, semantic pulls `search_hero_docs`, Claude calls it, response cites `(hero_os_guide, overview/quickstart/ai_pipeline)` with ~1s latency. - User: "How does the AI Assistant work?" → `search_hero_docs` hits `ai_pipeline.md` (this very doc), response explains the pipeline from the doc's own text. **Closed loop: the AI reads its own documentation.** - User: "List files in my geomind workspace" → triage→Tools, semantic pulls `list_files`, response shows webdav contents. - User: "Summarize the mycelium hosting plan" → triage→Tools, semantic pulls a namespace-aware search tool (once https://forge.ourworld.tf/lhumina_code/home/issues/XXX lands), response grounds in docs_mycelium. ## 11. What's on `development_mik_nu_demo` Docs_hero branch `development_mik_nu_demo` (NOT pushed) — commits: - `2c88aa2` — refreshed architecture.md + services.md + overview.md + quickstart.md for the nu-shell / hero_proc / hero_agent era - `6deb8df` — added ai_pipeline.md (this content in user-doc form) heronu VM state: - prompt.rs MANDATORY directive compiled into hero_agent_server binary (strings count = 1 for MANDATORY / Hero Vired / sovereign digital) - `HERO_AGENT_ROUTING_MODE=hybrid` + top_k=10 + threshold=0.25 applied via `hero_proc action set` - `AIBROKER_API_ENDPOINT` + `HERO_AGENT_AIBROKER_MODELS` + `OSIS_URL`/`OSIS_CONTEXT` in agent env - mcp.json trimmed to hero_books only (pending name-sanitizer patch before widening) - 4 libraries cloned + indexed (hero=163, geomind=1733, mycelium/ourworld in progress) - WASM release + brotli sidecars (211MB → 1.8MB — https://forge.ourworld.tf/lhumina_code/home/issues/140) - 30 photo/video/song seed records rewritten to strip leading slash (https://forge.ourworld.tf/lhumina_code/home/issues/156) - 49 .docx files seeded into Office archipelago (default + geomind + incubaid + root + threefold contexts) Signed-off-by: mik-tf