[nu-demo] STATE OF HERONU as of 2026-04-24 — what works, what's still broken, everything that needs upstream landing #160
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Purpose
This is the consolidated checkpoint issue for the heronu nu-shell demo. Use this as the starting point for the next session — it captures the current working state, every gap still open (with severity), operator workarounds applied directly to the VM, and the path to full reproducibility once the per-component fixes merge upstream.
The demo is good enough to present as-is — the killer feature (AI Assistant grounded in docs_hero) works, all seeded islands render real data, media + office files display. What's left is a known catalog.
For the intelligence pipeline / AI architecture spec: see #159.
For the nu-demo architecture index: see #148.
✅ What works end-to-end
(hero_os_guide, overview/quickstart/ai_pipeline)HERO_AGENT_ROUTING_MODE=hybrid,SEMANTIC_TOP_K=10,SEMANTIC_THRESHOLD=0.25applied. Embedder indexes the 58 built-in tools + any MCP tools, selects top-K per query — bypasses the 128-tool LLM cap.AIBROKER_API_ENDPOINT=http://10.1.2.2:9988/hero_aibroker/rest/v1./home/driver/code/docs_*: hero (1 book, 163 embedder docs), geomind (12 books, 1733+ docs), mycelium (21 books, Q&A extraction ongoing), ourworld (6 books)./home/driver/hero/var/agent/mcp.jsontrimmed to hero_books only (until tool-name sanitizer — home#153 — lands).POST /mcp/hero_booksreturnsprotocolVersion: 2025-03-26./data/home/driver/bin/typst, DejaVu fonts). Sizes 38 KB–710 KB. Served from hero_foundry webdav.storage_pathbug (fixed by rewrite — #156). Default context also seeded with 9 photo + 3 video + 3 song records.development_mik_nu_demoon VM).83fb985pushed — refreshed architecture.md, services.md, overview.md, quickstart.md + newai_pipeline.mdcanonical page.🟥 Blockers / high-severity gaps
1. Services island is architectural dead wood — NOT just a WASM feature gap
Current behavior: dock click →
Failed to load island WASM 'services': Island not found: services (HTTP 404).Root cause (architectural): In the hero_zero Docker era, the Services island proxied hero_proxy's service discovery. In the nu-shell / hero_router era, hero_router owns service discovery natively via its admin UI at
ui.sock(see https://forge.ourworld.tf/lhumina_code/hero_router/src/branch/development/CLAUDE.md). The "Services" dock entry is a stale artifact that references a code path that no longer makes sense.Proper fix:
"services"fromhero_os_app/src/registry.rs::build_islands()(around line 351) and from anyWASM_FEATURESlist./hero_router/ui/) if the UX goal is "operator sees running services."Demo workaround: broken icon stays in the dock; users hit 404 only if they explicitly click it.
2. hero_biz service is incomplete — only ui.sock, no rpc.sock
Current behavior: clicking "Biz" → blank page.
Root cause:
hero_proc service listshowshero_bizas running with only a_server+_uiaction, buthero_biz_serverexposes no rpc.sock — just ui.sock. The Biz island loads, JS tries to fetch RPC, gets nothing.Note: Biz island is separate from the "Business" island which maps to hero_osis business domain (that one works and has seeded data).
Fix: either flesh out hero_biz service to expose rpc endpoints, OR remove the Biz dock entry (same registry.rs rebuild).
3. Books UI "All Books" tab errors — Dioxus double-slash URL bug
Current behavior:
Error: JSON parse error: EOF while parsing a value at line 1 column 0.Root cause:
hero_archipelagos/archipelagos/embed/books/src/island.rs:28introduced in commitb7202b7 (fix(embed): use hero_router URL pattern for iframe islands):Then in
services/mod.rs:42:format!("{}/rpc", base_url)→/hero_books/ui//rpc→ 404.Verified on heronu:
curl -X POST 'http://10.1.2.2:9988/hero_books/ui/rpc' ...→ 200 OK (single slash works)curl -X POST 'http://10.1.2.2:9988/hero_books/ui//rpc' ...→ 404 (double slash 404s)Proper fix (in
hero_archipelagos): strip trailing slash frombase_urlORtrim_end_matches('/')inrpc_callbefore appending/rpc.Attempted demo fix (on heronu): added
collapse_slashes_middlewaretohero_books_ui/src/main.rsbefore.layer(cors). Binary rebuilt (~22 min), installed, process restarted — middleware is in source but does not fire at runtime. Likely axum-0.8 behavior or LTO eliminated path. Not yet root-caused.Full write-up: #157.
4. Office PDF island reads from hero_office, not hero_foundry webdav
Current behavior: PDF island shows
No PDF files in geomindeven though I seeded 21 .pdf into~/hero/var/hero_foundry/webdav/geomind/Documents/.Root cause:
hero_office_ui/src/handlers.rs::type_browsercallshero_office_server::list_documents(type_filter="pdf")— i.e. hero_office_server has its own file store and its own list/upload API, disjoint from hero_foundry's webdav. The/hero_office/ui/pdf/iframe therefore doesn't see files that live under hero_foundry.For hero_office to show the PDFs, they need to be uploaded via
hero_office_server's REST/RPC (not by dropping into webdav). Equivalent path needed for Docs, Sheets, Slides.Fix options:
5. Chat persistence broken — hero_osis_ai domain not registered
Current behavior: every tool-call turn logs
WARN hero_agent::osis_store: Failed to create audit entry error=Network error: HTTP error: 404 Not Found.Root cause:
hero_agent/src/osis_store.rsuseshero_osis_sdk'sAiClientto persist conversation history + audit/usage entries. The client targets thehero_osis_aidomain, which is NOT registered in the nu-shell per-domain server list. See existing issue #130.Result: AI Assistant works for single-turn queries but conversations aren't persisted — refresh the page and history is gone. Historically (hero_zero era) this was working via the monolithic hero_osis_server.
Fix: register
hero_osis_aiin the per-domain list (whatever drives hero_osis_server action registration in nu-shell), run OSIS schema migration for theaidomain, setOSIS_URLin hero_agent env to point at that domain'srpc.sock.6. Plain-chat grounding doesn't trigger — triage classifier routes Hero questions to Knowledge path
"What is Hero OS?" still returns a generic OS description because the triage classifier routes it to the
Knowledgebranch — no tools offered, nosearch_hero_docscall. See #152.Workaround: explicit tool hint ("Call search_hero_docs with ...") forces the Tools path and grounding works.
7. Tools payload sanitization needed for wider MCP
mcp.jsoncurrently wires only hero_books. Widening it to all 7 MCP services would produce ~165 tools — exceeds OpenAI's 128-tool cap AND MCP names contain dots (collections.list,search.query) which violate Anthropic's function-name regex^[a-zA-Z0-9_-]{1,128}$. See #153.Semantic routing (home#159 §2) picks top-K tools — but the top-K can still include MCP tools with dots, which Anthropic rejects. Both fixes are needed: semantic routing (live) + name sanitization (pending).
🟨 Medium-severity gaps
8. Livekit iframe defaults to dark theme when hero_os is light
Source patched on VM (
templates/base.html:data-bs-theme="dark"→"light") but hero_livekit_ui not yet rebuilt/installed. Also applies to hero_collab — see task #36 / home#147.9. Books Q&A cache bypass
Libraries ship pre-extracted Q&A pairs in
collections/<coll>/.ai/<page>.toml+ pre-computed.vectors.binembeddings, but hero_books'content_hashcheck mismatches → re-extracts via LLM → wastes 20-40 min per library. See #158.10. Projects milestone seed partial failure
Projects seeded successfully (30 objects), milestones failed due to status enum mismatch (
status = "active"in legacy TOMLs vs current enum{todo, in_progress, done}expects per-object schemas not all auto-mapped). Remaining story/milestone TOMLs have other schema drifts that need a full seed-data migration.Fix: update
hero_zero/dist/var/seed/*/projects/*.tomlfiles to match current schema, OR teachhero_osis_seedbinary to auto-migrate known enum values.11. Per-library AI search —
search_hero_docstool hardwired tonamespace="hero"Can't ask the AI "what does geomind's nitrograph memo say?" — tool always searches the
heronamespace. Add optionalnamespace/libraryparam to the tool, propagate tohero_books.search.query. Approximate effort: 10 lines + rebuild.12.
tool_choice=requiredinfinite loopPragmatic patch attempted and reverted — forcing
tool_choice: "required"on every turn causes Claude to call a tool after it already has the answer → infinite loop. Proper implementation needs turn-1-only gating. See #150.🟩 Low-severity / already-worked-around
13. VM rootfs is only 2 GB — blocked several installs
TF Grid ubuntu-24.04 flist ships with 2 GB rootfs;
/datais 100 GB. Workarounds applied on heronu:TMPDIR=/data/tmpfor all cargo builds (rustc scratch)CARGO_TARGET_DIR=/data/home/driver/cargo-targetapt cache/listssymlinked to/data/var//data/home/driver/bin/Proper fix: bump
rootfs_sizeinhero_zero/deploy/single-vm/tf/main.tfto 8 GB; teach hero_skills installers to default to/datapaths. (Issue draft was interrupted — can be filed on request.)14. Hero_osis_seed binary reports phantom errors
The legacy
hero_osis_seedbinary (from hero_zero dist) printserror decoding response bodyfor every write, but writes actually succeed. Symptom only — seed data did land (Business/Calendar/Media all populated). Minor cosmetic issue if future operators rely on the stderr output.15. hero_embedder tokio blocking reqwest
Already patched on VM via
tokio::task::block_in_placewraps inembedderd_client.rs. See #145.16. hero_router X-Hero-Context clobbering
Already patched on VM. See #125.
17. hero_livekit axum Extension layer ordering
Already patched on VM. See #126.
Operator-state snapshot (what's live on heronu only)
These are changes that exist ONLY on the VM and the
development_mik_nu_demobranches locally — not pushed to origin:development_mik_nu_demo(VM-local)always_include+=search_hero_docsdevelopment_mik_nu_demo(VM-local)collapse_slashes_middlewarein main.rs (compiled, doesn't fire at runtime)development_mik_nu_demo(VM-local)data-bs-themedefault flipped tolightdevelopment_mik_nu_demo(VM-local)development_mik_nu_demo(VM-local)block_in_placearound blocking reqwestdevelopment_mik_nu_demo(VM-local)development(REMOTE)hero_proc action state changes (runtime, not source):
hero_agent_server.env+=AIBROKER_API_ENDPOINT,HERO_AGENT_AIBROKER_MODELS,OSIS_URL,OSIS_CONTEXT,HERO_AGENT_ROUTING_MODE=hybrid, semantic top_k/thresholdhero_books_server.env+=FORGEJO_TOKEN,FORGE_TOKEN,GIT_TERMINAL_PROMPT=0, script argserveOSIS data writes (runtime):
storage_pathfields rewritten (leading slash stripped)O:prefixhero_osis_seedbinary for 5 contextsFile-system artifacts:
/home/driver/code/docs_{hero,mycelium,geomind,owh}— 4 library clones totaling 808 MB~/hero/var/hero_foundry/webdav/<ctx>/Documents/*.docx(47),*.pdf(45)~/hero/var/agent/mcp.jsonwith 1 MCP server (hero_books)/data/home/driver/bin/typst/data/home/driver/seed/— copy of hero_zero seed data + binaryReproducibility path
To recreate heronu-equivalent state from a fresh TF Grid VM:
hero_osis_seedre-roll. Port the legacyhero_zero/dist/var/seed/TOMLs to match current schemas, or add a migrator. Include all contexts (default + geomind + incubaid + root + threefold).hero_zero/deploy/single-vm/tf/main.tfso cargo + apt don't need/datatricks.mcp.jsononce tool-name sanitization lands — can then safely wire all 7 MCP services.ai_pipeline.mddoc to the hero_bookslibraries.txtchain so the AI can always cite its own architecture (closed loop).Once all that lands, a fresh
hero_skills install-all && hero_proc start ...on a 8 GB rootfs VM should produce a clean equivalent of today's heronu state.Related issues (chronological)
/Volumes/T7webfeature entries for island-room + others<img>double-slash (patched on VM via seed rewrite).ai/cacheSigned-off-by: mik-tf
make demotarget — provision + install + seed + verify a fresh Hero OS demo VM in one command #163publicip=truethe nu-shell deploy default #165hero_proc service startreturns 'service not found' #167webfeature build — OSIS/Services have only native variants, Services is dead wood, Videos missing, Books needs iframe default #171Session close-out 2026-04-24
Infrastructure
herodemo.gent01.grid.tf— 16 CPU / 32 GB / 200 GB / 16 GB rootfs / public IPv4 185.69.166.153 on freefarm node 2007, gateway on node 1.make destroy ENV=heronu) — 5 resources cleanly removed, freeing the heronu name contract.Snapshot
herodemo-backup-20260424-124021.tar.gzon workstation at~/heronu-backups/:f8d040f858de14d3affad9bb9f6d4f15ca3adcb6e5d81cdc036f80fe5e3dd11bhero/cfg,hero/bin(all compiled service binaries),hero/share,hero/var(excluding embedder model cache and logs),hero/code,actions.d.*, driver helper scriptsheronu-backup-20260424-033445.tar.gz, 2.5 GB) retained alongside as requestedCodebase
lhumina_code/hero_zerorenamed tolhumina_code/hero_demoon Forgedevelopmentbranch updated with:docs/ops/DEPLOYMENT_NU_HERO_OS.md— full reproducibility runbookdeploy/single-vm/tf/variables.tf— addedrootfs_size+gateway_nodevariablesdeploy/single-vm/tf/main.tf— wired rootfs_size into VM, cross-node gateway network5b0f0d5ondevelopmentDemo-time gaps filed on
lhumina_code/homeAll 30 issues filed this session:
Next deploy
To reproduce a fresh demo VM, follow
hero_demo/docs/ops/DEPLOYMENT_NU_HERO_OS.md. Snapshot restore is~/heronu-backups/herodemo-backup-20260424-124021.tar.gz(scp up → tar xzf →hero_proc service startthe stopped services).Signed-off-by: mik-tf
Update 2026-04-24 (late afternoon session)
Three hotfixes pushed to herodemo and verified live. 24/25 services running (was 23/24 — hero_voice now active).
Fix 1 — AI Assistant MCP tool wiring (home#153)
hero_agent::mcp_client.rspatched ondevelopment_mik_nu_demo: tool-name sanitizer (.→__),original_namefield for round-trip MCP calls.hero_agent_serverrebuilt + redeployed.Fix 2 — hero_voice (home#173)
/usr/local/onnxruntime-1.24/alongside 1.23.2 at/usr/local/onnxruntime/.hero_voiceaction env set withORT_LIB_LOCATION=/usr/local/onnxruntime-1.24/lib,ORT_PREFER_DYNAMIC_LINK=1,LD_LIBRARY_PATH=/usr/local/onnxruntime-1.24/lib.hero_voice_server+hero_voice_uibuilt and running.Fix 3 — Office viewing (home#174)
hero_office_ui::editor_pagepatched with browser-native PDF short-circuit:.pdf→ renders inline via<embed type="application/pdf">.docx/.xlsx/.pptx→ looks for companion.pdfwith same stem; if found, renders that with a "PDF preview" badgeVerified live
hero_agent_uihealthy, sanitizer in compiled binaryProd-level fix paths
All three hotfixes have prod-level long-term fix paths documented in their respective issues:
ORT_PREFER_DYNAMIC_LINKinto installer + ort↔onnx version detectorSnapshot tarball at
~/heronu-backups/herodemo-backup-20260424-124021.tar.gzwas taken BEFORE these three fixes — a fresh snapshot will be needed once all three changes settle.Signed-off-by: mik-tf
✅ Session close-out 2026-04-25
herodemo.gent01.grid.tf is in a demo-ready state. Auth-gated public URL with all archipelagos working except documented exceptions.
What works end-to-end
admin:admin123(canonical hero_proxy in home#182).vsdxseed tooling on VM (home#183)Engineering deliverables this session
Repo work:
lhumina_code/hero_zerorenamed →hero_demoon Forge (per session 12 hand-off)hero_demo/development: README rewrite (nu-shell primary), runbook (DEPLOYMENT_NU_HERO_OS.md), method doc (FIX_TRIAGE.md), TF variables (rootfs_size,gateway_node)hero_officePR #3 (lhumina_code/hero_office#3): native PDF preview, OnlyOffice reverse proxy with WS pass-through + streaming, JWT permissions widened, X-Forwarded-Host for cross-prefix URL genhero_os/development(one direct merge — process miss flagged + memorialized): Biz iframe URL (home#179)Issues filed this session: home#148 through home#183 — every gap captured.
Snapshot
~/heronu-backups/herodemo-backup-<TS>.tar.gz— restorable to a fresh TF Grid VM via runbook §5 (data restore from backup). Plus the originalheronu-backup-20260424-033445.tar.gzretained per request.Re-deploy from scratch
Full path documented in
hero_demo/docs/ops/DEPLOYMENT_NU_HERO_OS.md:deploy/single-vm/envs/<NAME>/)service_install_allMethod
Established the "Fix Triage" 4-level routing convention for collaborative demo+devops work:
Demo VM hotfixes are L1 by default. Anything that lands upstream needs PR review (L2). Direct push to
developmentis reserved forhero_demoonly.Devops pickup list (priority order)
/api/servicesaggregator (or per-domain split admin UIs). Same class of bug as home#180.setup.shDocker-on-TF-Grid recipe; could fold into hero_skills installer.Closing this issue. herodemo demo state is captured; the runbook + FIX_TRIAGE.md + per-gap home issues form the complete documentation.
Signed-off-by: mik-tf
Moved to hero_demo#28 — see lhumina_code/hero_demo#28
make demotarget — provision + install + seed + verify a fresh Hero OS demo VM in one command #31publicip=truethe nu-shell deploy default #33webfeature build — OSIS/Services have only native variants, Services is dead wood, Videos missing, Books needs iframe default #34