[plan] 24-hour killer demo — execution plan with hour-by-hour tasks #53

Open
opened 2026-05-01 02:05:20 +00:00 by mik-tf · 1 comment
Owner

Goal

A 10-minute Hero OS demo on herodemo.gent01.grid.tf that lands every beat without the user-visible reasoning leaks, half-broken services, or env-var workarounds we've been firefighting today. Concrete, time-boxed, executable.

This issue is the execution plan. The bigger picture is in hero_demo#52 (Hero OS vision) and the per-feature issues it cross-references.


What's already working (verified live 2026-04-30)

Capability Verified
Login + desktop shell
Per-context routing (default 0 / geomind 29 / threefold 29 / incubaid 34 / root 9 contacts)
"Ask the Librarian" with AI summaries ("what is this ourworld" → coherent multi-section AI summary)
AI Assistant grounding on docs_hero ("What is Hero OS?" → coherent answer with citations)
AI Assistant queries hero_biz contacts but ungainly (brute-forces socket paths visibly)
AI Assistant Python (uv)
Books library (4 libraries / 40 books)
Photos / Videos rendering per context
Biz dashboard with seeded data per context (post-e0b014c deploy)
Native "Business" duplicate removed from dock (post-#123 WASM rebuild)
TTS playback
OnlyOffice (read)
LiveKit infra

So most of the killer demo is already built. The 24-hour plan is polish + a few wiring fixes + reliability hooks, not feature build.


The 24-hour plan, hour by hour

Hour 0–4: highest leverage (docs_hero + curated MCP tools)

Goal: make the AI Assistant stop brute-forcing biz queries.

  • docs_hero capability pages (docs_hero/services/biz.md, docs_hero/services/calendar.md, docs_hero/services/photos.md, docs_hero/services/books.md, docs_hero/services/foundry.md, docs_hero/contexts.md)
    • Each page lists: what the service does, what queries it can answer, example prompts, MCP tools available, per-context behaviour.
    • Ship these and re-embed via hero_embedder → AI gets perceptibly smarter immediately, no agent code change.
    • 2 hours of writing == measurable demo improvement. Highest single-investment ROI.
  • Curated MCP tools for hero_bizbiz.list_contacts(context), biz.list_persons(context), biz.add_contact(context, name, ...). Tracked in hero_agent issue (filed alongside this).
    • Just biz first; calendar/photos/etc. follow in track A's "next" tier.
    • The agent stops having to search for socket paths; one clean tool call per prompt.

Hour 4–8: AI Assistant UX polish

Goal: hide the agent's reasoning scratchpad from the user-facing transcript.

  • Collapse chain-of-thought in the AI Assistant island. Use a <details> element collapsed by default, or a "Thinking..." indicator that shows only the final answer until clicked.
  • Streaming polish: thinking text streams in greyed/italic; the moment the agent emits its final answer, that section becomes the primary content.
  • Test against today's "who are my contacts in hero_biz?" prompt — should now show a clean answer without 30 lines of trial-and-error.

Hour 8–12: snapshot + smoke loop foundation

Goal: stop the demo from regressing while we polish.

  • Snapshot herodemo to a tarball — per #46 step 5. So if anything goes sideways before/during the demo we restore in minutes.
  • Smoke loop v1 — the panel that catches the failure modes we hit today (lhumina_code/home#201). Specifically:
    • photo download per context (/hero_foundry/rpc/api/files/<ctx>/Photos/<file>.jpg → 200 image/jpeg)
    • per-context routing (5 distinct responses for 5 different X-Hero-Context headers)
    • biz dashboard (/hero_biz/ui/c/<ctx>/persons → 200 + non-empty)
    • hero_books search (/hero_books/ui/search?q=hero → 200 + result count > 0)
    • AI Assistant /health (/hero_agent/rpc/health → 200)
  • Cron every 5 min, output to /var/log/herodemo-smoke.log. Failure threshold: any 200-expected returning non-200 → log warn line.

Hour 12–18: STT path + AI write to biz

Goal: the demo can run a voice round-trip and the AI can mutate state, not just read.

  • STT verification — speak into AI Assistant, verify the transcription path returns text. Today STT is partially built (kokoro-micro / hero_voice) but never verified end-to-end on herodemo. Document the known-good config.
  • AI Assistant write to hero_biz — expose biz.add_contact MCP tool, the agent can call it, demo prompt "add Alice Smith from Acme as a contact" creates the record + Biz dashboard reflects it.
  • Confirmation UX: when the agent calls a write-side tool, surface a brief "I added Alice Smith" confirmation rather than just the raw tool call.

Hour 18–24: rehearse, fix, snapshot again

Goal: play through the demo end-to-end three times. Fix anything that breaks. Snapshot the final known-good state.

  • Rehearsal 1 — full demo flow (see "the demo" below). Note every snag. Fix.
  • Rehearsal 2 — repeat. Should be cleaner. Note any remaining snags.
  • Rehearsal 3 — should be smooth. If not, fix and rehearse 4. Stop when 2 consecutive rehearsals are clean.
  • Final snapshot — tarball the demo VM in its rehearsed-clean state.

The demo (the 7-minute flow we rehearse)

1. Switch to Geomind context (visible context picker, top-left).
2. Open Books → click "OurWorld" library → "Ask the Librarian":
   "What is OurWorld and what are its goals?"
   → AI summary lands cleanly, multi-section, grounded on library content.

3. Open AI Assistant island.
   "What is Hero OS?"
   → Coherent answer, lists capabilities, with citation paths from docs_hero.

4. Same conversation:
   "Who are my contacts in Geomind?"
   → Clean answer, no scratchpad: "You have 29 contacts in Geomind. Here are
   the most recent 5: [list]." (post-curated-MCP-tools fix)

5. Same conversation:
   "Add Alice Smith from Acme as a contact."
   → "Done. Alice Smith is now in your Geomind contacts."
   → User checks Biz dashboard: count went 29 → 30. Alice is there.

6. Switch to ThreeFold context (visible context switch).
   Open Biz: 6 Persons / 6 Companies. Different data set.
   → multi-tenancy works cleanly across the same UI.

7. Open Photos: per-context library shows. Open Videos: Sintel plays.
8. Open Office: open a seeded .docx (read-only mode for now is acceptable).

[Optional polish for Hour 18+ work:]
9. Speak: "Hey hero, who are my contacts?" → wake word fires, conversation
   mode kicks in, voice answer. (only if STT path is verified Hour 12-18)

Acceptance

  • Hour 0–4 docs + biz MCP tools shipped and visible in AI Assistant behaviour.
  • Hour 4–8 AI Assistant UX polish merged.
  • Hour 8–12 snapshot taken; smoke loop v1 running every 5 min.
  • Hour 12–18 STT path verified or explicitly deferred with reason logged.
  • Hour 12–18 biz.add_contact MCP tool works end-to-end.
  • Hour 18–24 demo rehearsed clean twice consecutively.
  • Final snapshot tarball stored in ~/heronu-backups/ with sha256.
  • No env-var workarounds visible during demo. Everything works from a fresh deploy + the documented runbook.

What this issue is NOT

  • Not the long-term vision — that's hero_demo#52.
  • Not the reliability story — that's lhumina_code/hero_proc#86 META + smoke loop home#201.
  • Not the architectural story — that's hero_osis#43, home#202/#204, etc.

This issue is what to ship in the next 24 hours so the demo actually impresses someone tomorrow.

Cross-references

Signed-off-by: mik-tf

## Goal A 10-minute Hero OS demo on `herodemo.gent01.grid.tf` that lands every beat without the user-visible reasoning leaks, half-broken services, or env-var workarounds we've been firefighting today. Concrete, time-boxed, executable. This issue is the execution plan. The bigger picture is in **hero_demo#52 (Hero OS vision)** and the per-feature issues it cross-references. --- ## What's already working (verified live 2026-04-30) | Capability | Verified | |---|---| | Login + desktop shell | ✅ | | Per-context routing | ✅ (default 0 / geomind 29 / threefold 29 / incubaid 34 / root 9 contacts) | | "Ask the Librarian" with AI summaries | ✅ ("what is this ourworld" → coherent multi-section AI summary) | | AI Assistant grounding on docs_hero | ✅ ("What is Hero OS?" → coherent answer with citations) | | AI Assistant queries hero_biz contacts | ✅ but ungainly (brute-forces socket paths visibly) | | AI Assistant Python (uv) | ✅ | | Books library (4 libraries / 40 books) | ✅ | | Photos / Videos rendering per context | ✅ | | Biz dashboard with seeded data per context | ✅ (post-e0b014c deploy) | | Native "Business" duplicate removed from dock | ✅ (post-#123 WASM rebuild) | | TTS playback | ✅ | | OnlyOffice (read) | ✅ | | LiveKit infra | ✅ | So most of the killer demo is already built. The 24-hour plan is **polish + a few wiring fixes + reliability hooks**, not feature build. --- ## The 24-hour plan, hour by hour ### Hour 0–4: highest leverage (docs_hero + curated MCP tools) **Goal**: make the AI Assistant stop brute-forcing biz queries. - [ ] **docs_hero capability pages** (`docs_hero/services/biz.md`, `docs_hero/services/calendar.md`, `docs_hero/services/photos.md`, `docs_hero/services/books.md`, `docs_hero/services/foundry.md`, `docs_hero/contexts.md`) - Each page lists: what the service does, what queries it can answer, example prompts, MCP tools available, per-context behaviour. - Ship these and re-embed via hero_embedder → AI gets perceptibly smarter immediately, no agent code change. - **2 hours of writing == measurable demo improvement.** Highest single-investment ROI. - [ ] **Curated MCP tools for hero_biz** — `biz.list_contacts(context)`, `biz.list_persons(context)`, `biz.add_contact(context, name, ...)`. Tracked in hero_agent issue (filed alongside this). - Just biz first; calendar/photos/etc. follow in track A's "next" tier. - The agent stops having to search for socket paths; one clean tool call per prompt. ### Hour 4–8: AI Assistant UX polish **Goal**: hide the agent's reasoning scratchpad from the user-facing transcript. - [ ] **Collapse chain-of-thought** in the AI Assistant island. Use a `<details>` element collapsed by default, or a "Thinking..." indicator that shows only the final answer until clicked. - [ ] **Streaming polish**: thinking text streams in greyed/italic; the moment the agent emits its final answer, that section becomes the primary content. - [ ] Test against today's "who are my contacts in hero_biz?" prompt — should now show a clean answer without 30 lines of trial-and-error. ### Hour 8–12: snapshot + smoke loop foundation **Goal**: stop the demo from regressing while we polish. - [ ] **Snapshot herodemo** to a tarball — per https://forge.ourworld.tf/lhumina_code/hero_demo/issues/46 step 5. So if anything goes sideways before/during the demo we restore in minutes. - [ ] **Smoke loop v1** — the panel that catches the failure modes we hit today (https://forge.ourworld.tf/lhumina_code/home/issues/201). Specifically: - photo download per context (`/hero_foundry/rpc/api/files/<ctx>/Photos/<file>.jpg` → 200 image/jpeg) - per-context routing (5 distinct responses for 5 different X-Hero-Context headers) - biz dashboard (`/hero_biz/ui/c/<ctx>/persons` → 200 + non-empty) - hero_books search (`/hero_books/ui/search?q=hero` → 200 + result count > 0) - AI Assistant /health (`/hero_agent/rpc/health` → 200) - [ ] Cron every 5 min, output to `/var/log/herodemo-smoke.log`. Failure threshold: any 200-expected returning non-200 → log warn line. ### Hour 12–18: STT path + AI write to biz **Goal**: the demo can run a voice round-trip and the AI can mutate state, not just read. - [ ] **STT verification** — speak into AI Assistant, verify the transcription path returns text. Today STT is partially built (kokoro-micro / hero_voice) but never verified end-to-end on herodemo. Document the known-good config. - [ ] **AI Assistant write to hero_biz** — expose `biz.add_contact` MCP tool, the agent can call it, demo prompt "add Alice Smith from Acme as a contact" creates the record + Biz dashboard reflects it. - [ ] **Confirmation UX**: when the agent calls a write-side tool, surface a brief "I added Alice Smith" confirmation rather than just the raw tool call. ### Hour 18–24: rehearse, fix, snapshot again **Goal**: play through the demo end-to-end three times. Fix anything that breaks. Snapshot the final known-good state. - [ ] **Rehearsal 1** — full demo flow (see "the demo" below). Note every snag. Fix. - [ ] **Rehearsal 2** — repeat. Should be cleaner. Note any remaining snags. - [ ] **Rehearsal 3** — should be smooth. If not, fix and rehearse 4. Stop when 2 consecutive rehearsals are clean. - [ ] **Final snapshot** — tarball the demo VM in its rehearsed-clean state. --- ## The demo (the 7-minute flow we rehearse) ``` 1. Switch to Geomind context (visible context picker, top-left). 2. Open Books → click "OurWorld" library → "Ask the Librarian": "What is OurWorld and what are its goals?" → AI summary lands cleanly, multi-section, grounded on library content. 3. Open AI Assistant island. "What is Hero OS?" → Coherent answer, lists capabilities, with citation paths from docs_hero. 4. Same conversation: "Who are my contacts in Geomind?" → Clean answer, no scratchpad: "You have 29 contacts in Geomind. Here are the most recent 5: [list]." (post-curated-MCP-tools fix) 5. Same conversation: "Add Alice Smith from Acme as a contact." → "Done. Alice Smith is now in your Geomind contacts." → User checks Biz dashboard: count went 29 → 30. Alice is there. 6. Switch to ThreeFold context (visible context switch). Open Biz: 6 Persons / 6 Companies. Different data set. → multi-tenancy works cleanly across the same UI. 7. Open Photos: per-context library shows. Open Videos: Sintel plays. 8. Open Office: open a seeded .docx (read-only mode for now is acceptable). [Optional polish for Hour 18+ work:] 9. Speak: "Hey hero, who are my contacts?" → wake word fires, conversation mode kicks in, voice answer. (only if STT path is verified Hour 12-18) ``` --- ## Acceptance - [ ] Hour 0–4 docs + biz MCP tools shipped and visible in AI Assistant behaviour. - [ ] Hour 4–8 AI Assistant UX polish merged. - [ ] Hour 8–12 snapshot taken; smoke loop v1 running every 5 min. - [ ] Hour 12–18 STT path verified or explicitly deferred with reason logged. - [ ] Hour 12–18 `biz.add_contact` MCP tool works end-to-end. - [ ] Hour 18–24 demo rehearsed clean twice consecutively. - [ ] Final snapshot tarball stored in `~/heronu-backups/` with sha256. - [ ] No env-var workarounds visible during demo. Everything works from a fresh deploy + the documented runbook. --- ## What this issue is NOT - Not the long-term vision — that's hero_demo#52. - Not the reliability story — that's https://forge.ourworld.tf/lhumina_code/hero_proc/issues/86 META + smoke loop home#201. - Not the architectural story — that's hero_osis#43, home#202/#204, etc. This issue is **what to ship in the next 24 hours so the demo actually impresses someone tomorrow.** ## Cross-references - Vision: hero_demo#52 - Existing demo gaps: https://forge.ourworld.tf/lhumina_code/hero_demo/issues/51 - Reliability META: https://forge.ourworld.tf/lhumina_code/hero_proc/issues/86 - Smoke loop: https://forge.ourworld.tf/lhumina_code/home/issues/201 - Snapshot step (already on the list): https://forge.ourworld.tf/lhumina_code/hero_demo/issues/46 Signed-off-by: mik-tf
mik-tf self-assigned this 2026-05-01 02:05:20 +00:00
Author
Owner

Hour-by-Hour reconciliation against source (session 52)

Source-grounded re-read of hero_agent + hero_voice + hero_router suggests several Hour-N items need re-mapping vs current reality. None of this is conclusive — runtime verification on herodemo is the next step — but the design-level signal is strong.

Hour 4-8 — "fix scratchpad / chain-of-thought leakage in transcripts"

Likely already fixed at design level. ExecutionContract::to_prompt_section() builds an objective + constraints + success-criteria block but is never called. The contract only emits a ContractDerived event for the admin SSE dashboard — it's not injected into the LLM prompt or saved to OSIS conversation storage. If a leak is still observable on herodemo, the source is somewhere else (system prompt? streaming token formatting? raw provider response?), not the contract. Action: runtime verification before redoing the work.

Hour items: "wire MCP tools" / "expose biz writes via curated tools"

The auto-MCP gateway already exists. POST /mcp/{service_name} is live in hero_router and tools are auto-derived from OpenRPC. agent_run ships per service for free. hero_agent already consumes via the ~/hero/var/agent/mcp.json registry. Remaining work is curation, ranking, and write-side gating — not building MCP. See hero_agent#15 comment for the four reshaped subtasks. Action: replace these Hour items with the curation/gating tasks.

Hour 12-18 — STT verification

STT works in source: whisper.cpp local (HERO_VOICE_STT_LOCAL=true) with Groq whisper-large-v3-turbo cloud fallback via herolib-ai. Verification is still valuable; expectation is it should pass without surprises. No change needed.

Wake-word integration (if planned)

Rustpotter detector is hard-disabled (candle-core 0.2.2 conflict, wakeword.rs). Only working wake path is WebSocket Listen + VAD + substring-match on Whisper output (ws.rs:389). See hero_agent#16 comment. Action: any Hour item depending on wake-word should depend on the Rustpotter unblock.

TTS expectation

If any Hour item assumes "Kokoro + Groq fallback" for TTS — Groq fallback is STT-only. TTS is Kokoro-only.

Suggested next step: update Hour 4-8 to a runtime-verification task; replace "wire MCP" items with the curation work; flag any wake-word dependency as blocked on the Rustpotter dep upgrade.

Reconciliation memo: memory/investigation_roadmap_reconciliation.md (private workspace, session 52).

## Hour-by-Hour reconciliation against source (session 52) Source-grounded re-read of hero_agent + hero_voice + hero_router suggests several Hour-N items need re-mapping vs current reality. None of this is conclusive — runtime verification on herodemo is the next step — but the design-level signal is strong. **Hour 4-8 — "fix scratchpad / chain-of-thought leakage in transcripts"** Likely already fixed at design level. `ExecutionContract::to_prompt_section()` builds an objective + constraints + success-criteria block but **is never called**. The contract only emits a `ContractDerived` event for the admin SSE dashboard — it's not injected into the LLM prompt or saved to OSIS conversation storage. If a leak is still observable on herodemo, the source is somewhere else (system prompt? streaming token formatting? raw provider response?), not the contract. **Action: runtime verification before redoing the work.** **Hour items: "wire MCP tools" / "expose biz writes via curated tools"** The auto-MCP gateway already exists. `POST /mcp/{service_name}` is live in hero_router and tools are auto-derived from OpenRPC. `agent_run` ships per service for free. hero_agent already consumes via the `~/hero/var/agent/mcp.json` registry. Remaining work is **curation, ranking, and write-side gating** — not building MCP. See [hero_agent#15](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/15) comment for the four reshaped subtasks. **Action: replace these Hour items with the curation/gating tasks.** **Hour 12-18 — STT verification** STT works in source: whisper.cpp local (`HERO_VOICE_STT_LOCAL=true`) with Groq `whisper-large-v3-turbo` cloud fallback via herolib-ai. Verification is still valuable; expectation is it should pass without surprises. **No change needed.** **Wake-word integration (if planned)** Rustpotter detector is hard-disabled (candle-core 0.2.2 conflict, `wakeword.rs`). Only working wake path is WebSocket Listen + VAD + substring-match on Whisper output (`ws.rs:389`). See [hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16) comment. **Action: any Hour item depending on wake-word should depend on the Rustpotter unblock.** **TTS expectation** If any Hour item assumes "Kokoro + Groq fallback" for TTS — Groq fallback is STT-only. TTS is Kokoro-only. **Suggested next step:** update Hour 4-8 to a runtime-verification task; replace "wire MCP" items with the curation work; flag any wake-word dependency as blocked on the Rustpotter dep upgrade. Reconciliation memo: `memory/investigation_roadmap_reconciliation.md` (private workspace, session 52).
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_demo#53
No description provided.