lhumina_code/hero_agent

Fork 0

[feature] Ambient AI — top-right widget + wake word + conversation mode + OS-wide voice routing #16

New issue

Open

opened 2026-05-01 02:05:21 +00:00 by mik-tf · 1 comment

mik-tf commented

2026-05-01 02:05:21 +00:00

Owner

Vision

The AI Assistant is not an island — it's a layer. A small widget pinned top-right of every Hero OS page. Wake-word triggers. Conversation flows. The whole OS becomes voice-addressable.

You're scrolling photos and you say:

"Hey hero, switch to ThreeFold."
The widget wakes, hears, switches contexts. No clicks, no typing.

You're in a meeting and you say:

"Hey hero, what's on my calendar tomorrow?"
The widget answers in voice. Continues to listen. You say:
"Move the 2 p.m. to 3."
It does it. Confirms.

That's the target. The OS is reachable by voice from any island, any time.

Why this is achievable

The pieces are mostly built — what's needed is wiring + UX + reliability:

Piece	State
TTS (Kokoro + Groq fallback)	✅ working — verified live
STT (Whisper-style via hero_voice / kokoro-micro)	⚠️ infrastructure exists; end-to-end demo path needs verification
Wake-word detection	⚠️ built; needs integration with the OS-wide widget
Conversation mode (back-and-forth)	⚠️ partial; needs UX flow + wake-window management
AI Assistant tool calling	✅ working (see hero_agent issue on MCP discovery)
OS routing via postMessage (`hero_route`)	✅ working — the dock and links already use it

What this issue covers

Three components, ship in order:

A small floating widget pinned top-right of the OS shell, on every page, every island.

Closed-by-default state: small icon, breathing dot to indicate "listening".
Open state: chat panel slides in from the right. Same conversation history as the AI Assistant island (not a separate session — same agent, same context).
"Listening" / "Thinking" / "Speaking" states with clear visual cues.
Click-to-mute (visual cue: greyed icon, no breathing dot).

Implementation:

Add to hero_os_app/src/shell/ as a top-level component, mounted alongside the dock.
Subscribe to the same agent session the AI Assistant island uses (so opening the island shows the same history that the widget participated in).
Stays alive across island switches — it's part of the shell, not the island content.

2. Wake-word activation

Default wake word: hey hero. User-customisable in Settings → Voice.

Implementation:

Wake-word detection runs continuously when the widget is unmuted.
On detection: emit a "wake" event → widget transitions to "Listening" state → starts capturing audio for STT.
VAD (voice activity detection) bounds the listening window: when speech ends + 1s silence, transcribe and send to agent.
Settings page lets user pick from a small set of wake words: hey hero, hey ai, hello hero, hi hero, custom.

Existing partial work: kokoro-micro / hero_voice has wake-word detection built. This issue integrates it with the widget.

3. Conversation mode

After the agent answers, the widget stays listening for 5 seconds (configurable). User can speak again without re-triggering the wake word.

Visual cue during the open conversation window (e.g. pulsing border).
Window resets every time the user speaks (so back-and-forth feels natural).
Window closes after 5s silence → must re-wake to talk again.
Convo button on the AI Assistant island also activates this mode for the chat panel itself (already partially built — labelled "Convo" in screenshots).

Voice-to-OS-action: the routing layer

When the agent receives a voice prompt with intent to operate the OS, it calls one of the os.* MCP tools (defined in the curated tool issue):

os.open_island(island_id)
os.switch_context(context_name)
os.set_theme(mode)

These tools issue hero_route postMessage commands to the OS shell, which already knows how to route URLs, change contexts, etc. So:

"Hey hero, open contacts."

→ STT → agent → os.open_island('biz') + (implicit) biz.list_contacts() → island opens, contacts populate, agent says "Here are your 29 contacts."

The intent layer is just the LLM picking the right os.* tool, no new routing infra needed. hero_route already supports all the navigation we need.

UX details that matter

Mute is the default while typing in any text field. Otherwise the wake word fires while users are typing emails, notes, etc.
Visual privacy indicator: when the mic is hot, a small red dot somewhere fixed (e.g. browser tab icon) so users always know.
Mute persists across reload in localStorage.
Per-context wake: in shared/multi-user contexts (future), wake words can be context-scoped (only fire in your own context).
Latency budget: wake → "listening" cue should be <200ms; speech-end → response start <2s on local model paths, <5s with provider round-trip.

Acceptance

Top-right floating widget on every page of the OS shell.
Open/close states; chat panel slides in.
Shares conversation history with the AI Assistant island.
Click-to-mute state, persisted in localStorage.

Phase 2 — Wake word

"hey hero" detection working from any page.
Settings panel lets user pick wake word.
Visual cue during "Listening".
STT path verified: speak → transcript appears in widget.

Phase 3 — Conversation mode

After agent answers, listening window stays open for 5s.
User can speak again without re-wake.
Window resets on each user utterance.
Window closes on silence; re-wake required.

Phase 4 — OS-wide voice routing

"Hey hero, open contacts" → opens Biz island.
"Hey hero, switch to ThreeFold" → context switches.
"Hey hero, dark mode" → theme toggles.
Each routing intent maps to an os.* MCP tool, agent picks correctly.

What this issue is NOT

Not a re-architecture of hero_voice or kokoro-micro — they're working, this issue uses them.
Not the curated MCP tool surface — that's its own issue (see cross-references).
Not a multi-user voice system — single-user, single-context for v1.

Cross-references

Vision: lhumina_code/hero_demo#52
24-hour demo plan: lhumina_code/hero_demo#53 (Phase 1 widget likely fits in Hour 12-18 if STT is verified)
Curated MCP tools (os.* tools live there): filed alongside (hero_agent)
Existing voice issues: home#173 (ONNX 1.24 — affects STT path)
TTS provider half-close: #3

Signed-off-by: mik-tf

## Vision The AI Assistant is not an island — it's a layer. A small widget pinned top-right of every Hero OS page. Wake-word triggers. Conversation flows. The whole OS becomes voice-addressable. You're scrolling photos and you say: > "Hey hero, switch to ThreeFold." The widget wakes, hears, switches contexts. No clicks, no typing. You're in a meeting and you say: > "Hey hero, what's on my calendar tomorrow?" The widget answers in voice. Continues to listen. You say: > "Move the 2 p.m. to 3." It does it. Confirms. That's the target. **The OS is reachable by voice from any island, any time.** ## Why this is achievable The pieces are mostly built — what's needed is **wiring + UX + reliability**: | Piece | State | |---|---| | **TTS** (Kokoro + Groq fallback) | ✅ working — verified live | | **STT** (Whisper-style via hero_voice / kokoro-micro) | ⚠️ infrastructure exists; end-to-end demo path needs verification | | **Wake-word detection** | ⚠️ built; needs integration with the OS-wide widget | | **Conversation mode** (back-and-forth) | ⚠️ partial; needs UX flow + wake-window management | | **AI Assistant tool calling** | ✅ working (see hero_agent issue on MCP discovery) | | **OS routing via postMessage** (`hero_route`) | ✅ working — the dock and links already use it | ## What this issue covers Three components, ship in order: ### 1. OS-wide AI Widget A small floating widget pinned top-right of the OS shell, on every page, every island. - Closed-by-default state: small icon, breathing dot to indicate "listening". - Open state: chat panel slides in from the right. Same conversation history as the AI Assistant island (not a separate session — same agent, same context). - "Listening" / "Thinking" / "Speaking" states with clear visual cues. - Click-to-mute (visual cue: greyed icon, no breathing dot). Implementation: - Add to `hero_os_app/src/shell/` as a top-level component, mounted alongside the dock. - Subscribe to the same agent session the AI Assistant island uses (so opening the island shows the same history that the widget participated in). - Stays alive across island switches — it's part of the shell, not the island content. ### 2. Wake-word activation Default wake word: `hey hero`. User-customisable in Settings → Voice. Implementation: - Wake-word detection runs continuously when the widget is unmuted. - On detection: emit a "wake" event → widget transitions to "Listening" state → starts capturing audio for STT. - VAD (voice activity detection) bounds the listening window: when speech ends + 1s silence, transcribe and send to agent. - Settings page lets user pick from a small set of wake words: `hey hero`, `hey ai`, `hello hero`, `hi hero`, custom. Existing partial work: kokoro-micro / hero_voice has wake-word detection built. This issue integrates it with the widget. ### 3. Conversation mode After the agent answers, the widget **stays listening for 5 seconds** (configurable). User can speak again without re-triggering the wake word. - Visual cue during the open conversation window (e.g. pulsing border). - Window resets every time the user speaks (so back-and-forth feels natural). - Window closes after 5s silence → must re-wake to talk again. - Convo button on the AI Assistant island also activates this mode for the chat panel itself (already partially built — labelled "Convo" in screenshots). ## Voice-to-OS-action: the routing layer When the agent receives a voice prompt with intent to operate the OS, it calls one of the `os.*` MCP tools (defined in the curated tool issue): - `os.open_island(island_id)` - `os.switch_context(context_name)` - `os.set_theme(mode)` These tools issue `hero_route` postMessage commands to the OS shell, which already knows how to route URLs, change contexts, etc. So: > "Hey hero, open contacts." → STT → agent → `os.open_island('biz')` + (implicit) `biz.list_contacts()` → island opens, contacts populate, agent says "Here are your 29 contacts." The intent layer is just **the LLM picking the right `os.*` tool**, no new routing infra needed. hero_route already supports all the navigation we need. ## UX details that matter - **Mute is the default while typing in any text field**. Otherwise the wake word fires while users are typing emails, notes, etc. - **Visual privacy indicator**: when the mic is hot, a small red dot somewhere fixed (e.g. browser tab icon) so users always know. - **Mute persists across reload** in localStorage. - **Per-context wake**: in shared/multi-user contexts (future), wake words can be context-scoped (only fire in your own context). - **Latency budget**: wake → "listening" cue should be <200ms; speech-end → response start <2s on local model paths, <5s with provider round-trip. ## Acceptance ### Phase 1 — Widget (visual only, no audio yet) - [ ] Top-right floating widget on every page of the OS shell. - [ ] Open/close states; chat panel slides in. - [ ] Shares conversation history with the AI Assistant island. - [ ] Click-to-mute state, persisted in localStorage. ### Phase 2 — Wake word - [ ] "hey hero" detection working from any page. - [ ] Settings panel lets user pick wake word. - [ ] Visual cue during "Listening". - [ ] STT path verified: speak → transcript appears in widget. ### Phase 3 — Conversation mode - [ ] After agent answers, listening window stays open for 5s. - [ ] User can speak again without re-wake. - [ ] Window resets on each user utterance. - [ ] Window closes on silence; re-wake required. ### Phase 4 — OS-wide voice routing - [ ] "Hey hero, open contacts" → opens Biz island. - [ ] "Hey hero, switch to ThreeFold" → context switches. - [ ] "Hey hero, dark mode" → theme toggles. - [ ] Each routing intent maps to an `os.*` MCP tool, agent picks correctly. ## What this issue is NOT - Not a re-architecture of hero_voice or kokoro-micro — they're working, this issue uses them. - Not the curated MCP tool surface — that's its own issue (see cross-references). - Not a multi-user voice system — single-user, single-context for v1. ## Cross-references - Vision: https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52 - 24-hour demo plan: https://forge.ourworld.tf/lhumina_code/hero_demo/issues/53 (Phase 1 widget likely fits in Hour 12-18 if STT is verified) - Curated MCP tools (`os.*` tools live there): filed alongside (hero_agent) - Existing voice issues: home#173 (ONNX 1.24 — affects STT path) - TTS provider half-close: https://forge.ourworld.tf/lhumina_code/hero_agent/issues/3 Signed-off-by: mik-tf

mik-tf self-assigned this

2026-05-01 02:05:21 +00:00

mik-tf referenced this issue from lhumina_code/hero_demo

2026-05-01 02:05:41 +00:00

[goal] Killer demo capabilities — what hero_demo should be able to demonstrate #51

mik-tf referenced this issue

2026-05-01 02:13:29 +00:00

[arch] Curated MCP tool surface + semantic discovery via embedder/indexer (stop the brute-force) #15

mik-tf commented

2026-05-01 04:09:11 +00:00

Author

Owner

Update from source-grounded read (session 52)

After reading hero_voice + kokoro-micro, the wake-word path is more fragmented than the issue body assumes.

Wake-word state today:

The "real" detector (Rustpotter) is a hard-disabled stub in hero_voice/.../wakeword.rs due to a candle-core 0.2.2 dependency conflict.
The only working wake path is the WebSocket Listen mode that VAD-segments microphone input and substring-matches "hey hero" on Whisper STT output (ws.rs:389). Coarse, false-positive-prone, not exposed via OpenRPC/MCP.

STT/TTS state today:

STT works: default whisper_local via whisper.cpp (HERO_VOICE_STT_LOCAL=true), falling back to Groq whisper-large-v3-turbo via herolib-ai.
TTS is Kokoro-only through embedded kokoro-micro (POST /tts on ui.sock; models auto-downloaded to ~/.cache/k/). There is no Groq TTS path in code despite earlier "Groq fallback" framing — Groq fallback applies to STT only.
Neither STT nor TTS is on the OpenRPC surface (which is purely Topic/Folder CRUD today).

OS-wide widget / conversation mode:

Not present in hero_voice repo. hero_voice_app ships only the per-island Dioxus voice editor. The widget + conversation mode this issue describes don't exist yet.

Suggested split:

Unblock Rustpotter — upgrade candle-core to a version compatible with the rest of the workspace, or pick an alternative detector (porcupine, snowboy successor, etc.). This is the keystone — without it, wake-word stays substring-matching on Whisper output.
OS-wide widget + conversation mode — separate UI work in hero_os (probably a native island) that talks to hero_voice over a streaming socket.
Expose voice via OpenRPC/MCP so hero_agent can drive synth/transcribe as tools.

Reconciliation memo for session 52: memory/investigation_roadmap_reconciliation.md (private workspace).

## Update from source-grounded read (session 52) After reading hero_voice + kokoro-micro, the wake-word path is more fragmented than the issue body assumes. **Wake-word state today:** - The "real" detector (Rustpotter) is a **hard-disabled stub** in `hero_voice/.../wakeword.rs` due to a `candle-core 0.2.2` dependency conflict. - The only working wake path is the WebSocket `Listen` mode that VAD-segments microphone input and substring-matches `"hey hero"` on Whisper STT output (`ws.rs:389`). Coarse, false-positive-prone, **not exposed via OpenRPC/MCP**. **STT/TTS state today:** - STT works: default `whisper_local` via whisper.cpp (`HERO_VOICE_STT_LOCAL=true`), falling back to Groq `whisper-large-v3-turbo` via `herolib-ai`. - TTS is **Kokoro-only** through embedded `kokoro-micro` (`POST /tts` on `ui.sock`; models auto-downloaded to `~/.cache/k/`). There is **no Groq TTS path** in code despite earlier "Groq fallback" framing — Groq fallback applies to STT only. - Neither STT nor TTS is on the OpenRPC surface (which is purely Topic/Folder CRUD today). **OS-wide widget / conversation mode:** Not present in `hero_voice` repo. `hero_voice_app` ships only the per-island Dioxus voice editor. The widget + conversation mode this issue describes don't exist yet. **Suggested split:** 1. **Unblock Rustpotter** — upgrade candle-core to a version compatible with the rest of the workspace, or pick an alternative detector (porcupine, snowboy successor, etc.). This is the keystone — without it, wake-word stays substring-matching on Whisper output. 2. **OS-wide widget + conversation mode** — separate UI work in hero_os (probably a native island) that talks to hero_voice over a streaming socket. 3. **Expose voice via OpenRPC/MCP** so hero_agent can drive synth/transcribe as tools. Reconciliation memo for session 52: `memory/investigation_roadmap_reconciliation.md` (private workspace).

mik-tf referenced this issue from lhumina_code/hero_demo

2026-05-01 04:09:14 +00:00

[plan] 24-hour killer demo — execution plan with hour-by-hour tasks #53

mik-tf referenced this issue from lhumina_code/hero_voice

2026-05-01 04:09:30 +00:00

[bug] hero_voice Rustpotter wake-word detector hard-disabled by candle-core 0.2.2 conflict #23

mik-tf referenced this issue from a commit

2026-05-01 04:23:05 +00:00

docs: rewrite Phase 1 Tier A pages with source-grounded content

mik-tf referenced this issue from a commit

2026-05-01 04:27:49 +00:00

docs: Phase 1 Tier A pages — source-grounded service docs