[feature] Ambient AI — top-right widget + wake word + conversation mode + OS-wide voice routing #16

Open
opened 2026-05-01 02:05:21 +00:00 by mik-tf · 1 comment
Owner

Vision

The AI Assistant is not an island — it's a layer. A small widget pinned top-right of every Hero OS page. Wake-word triggers. Conversation flows. The whole OS becomes voice-addressable.

You're scrolling photos and you say:

"Hey hero, switch to ThreeFold."
The widget wakes, hears, switches contexts. No clicks, no typing.

You're in a meeting and you say:

"Hey hero, what's on my calendar tomorrow?"
The widget answers in voice. Continues to listen. You say:
"Move the 2 p.m. to 3."
It does it. Confirms.

That's the target. The OS is reachable by voice from any island, any time.

Why this is achievable

The pieces are mostly built — what's needed is wiring + UX + reliability:

Piece State
TTS (Kokoro + Groq fallback) working — verified live
STT (Whisper-style via hero_voice / kokoro-micro) ⚠️ infrastructure exists; end-to-end demo path needs verification
Wake-word detection ⚠️ built; needs integration with the OS-wide widget
Conversation mode (back-and-forth) ⚠️ partial; needs UX flow + wake-window management
AI Assistant tool calling working (see hero_agent issue on MCP discovery)
OS routing via postMessage (hero_route) working — the dock and links already use it

What this issue covers

Three components, ship in order:

1. OS-wide AI Widget

A small floating widget pinned top-right of the OS shell, on every page, every island.

  • Closed-by-default state: small icon, breathing dot to indicate "listening".
  • Open state: chat panel slides in from the right. Same conversation history as the AI Assistant island (not a separate session — same agent, same context).
  • "Listening" / "Thinking" / "Speaking" states with clear visual cues.
  • Click-to-mute (visual cue: greyed icon, no breathing dot).

Implementation:

  • Add to hero_os_app/src/shell/ as a top-level component, mounted alongside the dock.
  • Subscribe to the same agent session the AI Assistant island uses (so opening the island shows the same history that the widget participated in).
  • Stays alive across island switches — it's part of the shell, not the island content.

2. Wake-word activation

Default wake word: hey hero. User-customisable in Settings → Voice.

Implementation:

  • Wake-word detection runs continuously when the widget is unmuted.
  • On detection: emit a "wake" event → widget transitions to "Listening" state → starts capturing audio for STT.
  • VAD (voice activity detection) bounds the listening window: when speech ends + 1s silence, transcribe and send to agent.
  • Settings page lets user pick from a small set of wake words: hey hero, hey ai, hello hero, hi hero, custom.

Existing partial work: kokoro-micro / hero_voice has wake-word detection built. This issue integrates it with the widget.

3. Conversation mode

After the agent answers, the widget stays listening for 5 seconds (configurable). User can speak again without re-triggering the wake word.

  • Visual cue during the open conversation window (e.g. pulsing border).
  • Window resets every time the user speaks (so back-and-forth feels natural).
  • Window closes after 5s silence → must re-wake to talk again.
  • Convo button on the AI Assistant island also activates this mode for the chat panel itself (already partially built — labelled "Convo" in screenshots).

Voice-to-OS-action: the routing layer

When the agent receives a voice prompt with intent to operate the OS, it calls one of the os.* MCP tools (defined in the curated tool issue):

  • os.open_island(island_id)
  • os.switch_context(context_name)
  • os.set_theme(mode)

These tools issue hero_route postMessage commands to the OS shell, which already knows how to route URLs, change contexts, etc. So:

"Hey hero, open contacts."

→ STT → agent → os.open_island('biz') + (implicit) biz.list_contacts() → island opens, contacts populate, agent says "Here are your 29 contacts."

The intent layer is just the LLM picking the right os.* tool, no new routing infra needed. hero_route already supports all the navigation we need.

UX details that matter

  • Mute is the default while typing in any text field. Otherwise the wake word fires while users are typing emails, notes, etc.
  • Visual privacy indicator: when the mic is hot, a small red dot somewhere fixed (e.g. browser tab icon) so users always know.
  • Mute persists across reload in localStorage.
  • Per-context wake: in shared/multi-user contexts (future), wake words can be context-scoped (only fire in your own context).
  • Latency budget: wake → "listening" cue should be <200ms; speech-end → response start <2s on local model paths, <5s with provider round-trip.

Acceptance

Phase 1 — Widget (visual only, no audio yet)

  • Top-right floating widget on every page of the OS shell.
  • Open/close states; chat panel slides in.
  • Shares conversation history with the AI Assistant island.
  • Click-to-mute state, persisted in localStorage.

Phase 2 — Wake word

  • "hey hero" detection working from any page.
  • Settings panel lets user pick wake word.
  • Visual cue during "Listening".
  • STT path verified: speak → transcript appears in widget.

Phase 3 — Conversation mode

  • After agent answers, listening window stays open for 5s.
  • User can speak again without re-wake.
  • Window resets on each user utterance.
  • Window closes on silence; re-wake required.

Phase 4 — OS-wide voice routing

  • "Hey hero, open contacts" → opens Biz island.
  • "Hey hero, switch to ThreeFold" → context switches.
  • "Hey hero, dark mode" → theme toggles.
  • Each routing intent maps to an os.* MCP tool, agent picks correctly.

What this issue is NOT

  • Not a re-architecture of hero_voice or kokoro-micro — they're working, this issue uses them.
  • Not the curated MCP tool surface — that's its own issue (see cross-references).
  • Not a multi-user voice system — single-user, single-context for v1.

Cross-references

  • Vision: lhumina_code/hero_demo#52
  • 24-hour demo plan: lhumina_code/hero_demo#53 (Phase 1 widget likely fits in Hour 12-18 if STT is verified)
  • Curated MCP tools (os.* tools live there): filed alongside (hero_agent)
  • Existing voice issues: home#173 (ONNX 1.24 — affects STT path)
  • TTS provider half-close: #3

Signed-off-by: mik-tf

## Vision The AI Assistant is not an island — it's a layer. A small widget pinned top-right of every Hero OS page. Wake-word triggers. Conversation flows. The whole OS becomes voice-addressable. You're scrolling photos and you say: > "Hey hero, switch to ThreeFold." The widget wakes, hears, switches contexts. No clicks, no typing. You're in a meeting and you say: > "Hey hero, what's on my calendar tomorrow?" The widget answers in voice. Continues to listen. You say: > "Move the 2 p.m. to 3." It does it. Confirms. That's the target. **The OS is reachable by voice from any island, any time.** ## Why this is achievable The pieces are mostly built — what's needed is **wiring + UX + reliability**: | Piece | State | |---|---| | **TTS** (Kokoro + Groq fallback) | ✅ working — verified live | | **STT** (Whisper-style via hero_voice / kokoro-micro) | ⚠️ infrastructure exists; end-to-end demo path needs verification | | **Wake-word detection** | ⚠️ built; needs integration with the OS-wide widget | | **Conversation mode** (back-and-forth) | ⚠️ partial; needs UX flow + wake-window management | | **AI Assistant tool calling** | ✅ working (see hero_agent issue on MCP discovery) | | **OS routing via postMessage** (`hero_route`) | ✅ working — the dock and links already use it | ## What this issue covers Three components, ship in order: ### 1. OS-wide AI Widget A small floating widget pinned top-right of the OS shell, on every page, every island. - Closed-by-default state: small icon, breathing dot to indicate "listening". - Open state: chat panel slides in from the right. Same conversation history as the AI Assistant island (not a separate session — same agent, same context). - "Listening" / "Thinking" / "Speaking" states with clear visual cues. - Click-to-mute (visual cue: greyed icon, no breathing dot). Implementation: - Add to `hero_os_app/src/shell/` as a top-level component, mounted alongside the dock. - Subscribe to the same agent session the AI Assistant island uses (so opening the island shows the same history that the widget participated in). - Stays alive across island switches — it's part of the shell, not the island content. ### 2. Wake-word activation Default wake word: `hey hero`. User-customisable in Settings → Voice. Implementation: - Wake-word detection runs continuously when the widget is unmuted. - On detection: emit a "wake" event → widget transitions to "Listening" state → starts capturing audio for STT. - VAD (voice activity detection) bounds the listening window: when speech ends + 1s silence, transcribe and send to agent. - Settings page lets user pick from a small set of wake words: `hey hero`, `hey ai`, `hello hero`, `hi hero`, custom. Existing partial work: kokoro-micro / hero_voice has wake-word detection built. This issue integrates it with the widget. ### 3. Conversation mode After the agent answers, the widget **stays listening for 5 seconds** (configurable). User can speak again without re-triggering the wake word. - Visual cue during the open conversation window (e.g. pulsing border). - Window resets every time the user speaks (so back-and-forth feels natural). - Window closes after 5s silence → must re-wake to talk again. - Convo button on the AI Assistant island also activates this mode for the chat panel itself (already partially built — labelled "Convo" in screenshots). ## Voice-to-OS-action: the routing layer When the agent receives a voice prompt with intent to operate the OS, it calls one of the `os.*` MCP tools (defined in the curated tool issue): - `os.open_island(island_id)` - `os.switch_context(context_name)` - `os.set_theme(mode)` These tools issue `hero_route` postMessage commands to the OS shell, which already knows how to route URLs, change contexts, etc. So: > "Hey hero, open contacts." → STT → agent → `os.open_island('biz')` + (implicit) `biz.list_contacts()` → island opens, contacts populate, agent says "Here are your 29 contacts." The intent layer is just **the LLM picking the right `os.*` tool**, no new routing infra needed. hero_route already supports all the navigation we need. ## UX details that matter - **Mute is the default while typing in any text field**. Otherwise the wake word fires while users are typing emails, notes, etc. - **Visual privacy indicator**: when the mic is hot, a small red dot somewhere fixed (e.g. browser tab icon) so users always know. - **Mute persists across reload** in localStorage. - **Per-context wake**: in shared/multi-user contexts (future), wake words can be context-scoped (only fire in your own context). - **Latency budget**: wake → "listening" cue should be <200ms; speech-end → response start <2s on local model paths, <5s with provider round-trip. ## Acceptance ### Phase 1 — Widget (visual only, no audio yet) - [ ] Top-right floating widget on every page of the OS shell. - [ ] Open/close states; chat panel slides in. - [ ] Shares conversation history with the AI Assistant island. - [ ] Click-to-mute state, persisted in localStorage. ### Phase 2 — Wake word - [ ] "hey hero" detection working from any page. - [ ] Settings panel lets user pick wake word. - [ ] Visual cue during "Listening". - [ ] STT path verified: speak → transcript appears in widget. ### Phase 3 — Conversation mode - [ ] After agent answers, listening window stays open for 5s. - [ ] User can speak again without re-wake. - [ ] Window resets on each user utterance. - [ ] Window closes on silence; re-wake required. ### Phase 4 — OS-wide voice routing - [ ] "Hey hero, open contacts" → opens Biz island. - [ ] "Hey hero, switch to ThreeFold" → context switches. - [ ] "Hey hero, dark mode" → theme toggles. - [ ] Each routing intent maps to an `os.*` MCP tool, agent picks correctly. ## What this issue is NOT - Not a re-architecture of hero_voice or kokoro-micro — they're working, this issue uses them. - Not the curated MCP tool surface — that's its own issue (see cross-references). - Not a multi-user voice system — single-user, single-context for v1. ## Cross-references - Vision: https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52 - 24-hour demo plan: https://forge.ourworld.tf/lhumina_code/hero_demo/issues/53 (Phase 1 widget likely fits in Hour 12-18 if STT is verified) - Curated MCP tools (`os.*` tools live there): filed alongside (hero_agent) - Existing voice issues: home#173 (ONNX 1.24 — affects STT path) - TTS provider half-close: https://forge.ourworld.tf/lhumina_code/hero_agent/issues/3 Signed-off-by: mik-tf
mik-tf self-assigned this 2026-05-01 02:05:21 +00:00
Author
Owner

Update from source-grounded read (session 52)

After reading hero_voice + kokoro-micro, the wake-word path is more fragmented than the issue body assumes.

Wake-word state today:

  • The "real" detector (Rustpotter) is a hard-disabled stub in hero_voice/.../wakeword.rs due to a candle-core 0.2.2 dependency conflict.
  • The only working wake path is the WebSocket Listen mode that VAD-segments microphone input and substring-matches "hey hero" on Whisper STT output (ws.rs:389). Coarse, false-positive-prone, not exposed via OpenRPC/MCP.

STT/TTS state today:

  • STT works: default whisper_local via whisper.cpp (HERO_VOICE_STT_LOCAL=true), falling back to Groq whisper-large-v3-turbo via herolib-ai.
  • TTS is Kokoro-only through embedded kokoro-micro (POST /tts on ui.sock; models auto-downloaded to ~/.cache/k/). There is no Groq TTS path in code despite earlier "Groq fallback" framing — Groq fallback applies to STT only.
  • Neither STT nor TTS is on the OpenRPC surface (which is purely Topic/Folder CRUD today).

OS-wide widget / conversation mode:

Not present in hero_voice repo. hero_voice_app ships only the per-island Dioxus voice editor. The widget + conversation mode this issue describes don't exist yet.

Suggested split:

  1. Unblock Rustpotter — upgrade candle-core to a version compatible with the rest of the workspace, or pick an alternative detector (porcupine, snowboy successor, etc.). This is the keystone — without it, wake-word stays substring-matching on Whisper output.
  2. OS-wide widget + conversation mode — separate UI work in hero_os (probably a native island) that talks to hero_voice over a streaming socket.
  3. Expose voice via OpenRPC/MCP so hero_agent can drive synth/transcribe as tools.

Reconciliation memo for session 52: memory/investigation_roadmap_reconciliation.md (private workspace).

## Update from source-grounded read (session 52) After reading hero_voice + kokoro-micro, the wake-word path is more fragmented than the issue body assumes. **Wake-word state today:** - The "real" detector (Rustpotter) is a **hard-disabled stub** in `hero_voice/.../wakeword.rs` due to a `candle-core 0.2.2` dependency conflict. - The only working wake path is the WebSocket `Listen` mode that VAD-segments microphone input and substring-matches `"hey hero"` on Whisper STT output (`ws.rs:389`). Coarse, false-positive-prone, **not exposed via OpenRPC/MCP**. **STT/TTS state today:** - STT works: default `whisper_local` via whisper.cpp (`HERO_VOICE_STT_LOCAL=true`), falling back to Groq `whisper-large-v3-turbo` via `herolib-ai`. - TTS is **Kokoro-only** through embedded `kokoro-micro` (`POST /tts` on `ui.sock`; models auto-downloaded to `~/.cache/k/`). There is **no Groq TTS path** in code despite earlier "Groq fallback" framing — Groq fallback applies to STT only. - Neither STT nor TTS is on the OpenRPC surface (which is purely Topic/Folder CRUD today). **OS-wide widget / conversation mode:** Not present in `hero_voice` repo. `hero_voice_app` ships only the per-island Dioxus voice editor. The widget + conversation mode this issue describes don't exist yet. **Suggested split:** 1. **Unblock Rustpotter** — upgrade candle-core to a version compatible with the rest of the workspace, or pick an alternative detector (porcupine, snowboy successor, etc.). This is the keystone — without it, wake-word stays substring-matching on Whisper output. 2. **OS-wide widget + conversation mode** — separate UI work in hero_os (probably a native island) that talks to hero_voice over a streaming socket. 3. **Expose voice via OpenRPC/MCP** so hero_agent can drive synth/transcribe as tools. Reconciliation memo for session 52: `memory/investigation_roadmap_reconciliation.md` (private workspace).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_agent#16
No description provided.