[bug] hero_voice Rustpotter wake-word detector hard-disabled by candle-core 0.2.2 conflict #23

Open
opened 2026-05-01 04:09:30 +00:00 by mik-tf · 0 comments
Owner

Summary

The Rustpotter wake-word detector is hard-disabled stub in hero_voice due to a candle-core 0.2.2 dependency conflict with the rest of the workspace. The only working wake path is a fragile fallback: WebSocket Listen mode that VAD-segments microphone input and substring-matches "hey hero" on Whisper STT output. This blocks the Ambient AI roadmap (hero_agent#16).

Source

  • hero_voice/.../wakeword.rs — Rustpotter integration is conditionally compiled out / always returns the stub.
  • hero_voice/.../ws.rs:389 — substring-match "hey hero" on Whisper transcription is the only live wake path.
  • Neither STT/TTS/wake is exposed via OpenRPC (the OpenRPC surface is purely Topic/Folder CRUD today).

Why the fallback is fragile

  • Whisper has to fully transcribe before the substring match runs → high latency, ~500-1500ms more than a dedicated detector.
  • False positives on any phrase that includes "hey" + "hero"-rhyming words.
  • Needs full microphone audio + STT pipeline running constantly → CPU + power cost vs a tiny dedicated detector.
  • Not exposed as a tool the agent or other services can subscribe to.

Proposed fix (pick one)

Option A: upgrade candle-core. Find the candle-core version range that's compatible with the rest of the workspace and unstub Rustpotter. Cleanest but may cascade into other dep upgrades.

Option B: alternative detector. Pick a different wake-word library that doesn't depend on candle-core. Candidates worth evaluating: porcupine (Picovoice; commercial license), openWakeWord (ONNX-based), simple keyword spotting with whisper-base on a short audio window.

Option C: live with the substring fallback but expose it cleanly via OpenRPC + MCP so other services can subscribe. Doesn't fix the latency / false-positive cost.

Severity

Medium. Not a deploy blocker (substring fallback works), but the Ambient AI vision in hero_agent#16 and hero_demo#52 leans on responsive wake — the substring path doesn't get there.

Cross-refs

Spotted during docs_hero Phase 1 source-grounded read (session 52). Reconciliation memo: memory/investigation_roadmap_reconciliation.md.

## Summary The Rustpotter wake-word detector is **hard-disabled stub** in `hero_voice` due to a `candle-core 0.2.2` dependency conflict with the rest of the workspace. The only working wake path is a fragile fallback: WebSocket `Listen` mode that VAD-segments microphone input and substring-matches `"hey hero"` on Whisper STT output. This blocks the Ambient AI roadmap ([hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16)). ## Source - `hero_voice/.../wakeword.rs` — Rustpotter integration is conditionally compiled out / always returns the stub. - `hero_voice/.../ws.rs:389` — substring-match `"hey hero"` on Whisper transcription is the only live wake path. - Neither STT/TTS/wake is exposed via OpenRPC (the OpenRPC surface is purely Topic/Folder CRUD today). ## Why the fallback is fragile - Whisper has to fully transcribe before the substring match runs → high latency, ~500-1500ms more than a dedicated detector. - False positives on any phrase that includes "hey" + "hero"-rhyming words. - Needs full microphone audio + STT pipeline running constantly → CPU + power cost vs a tiny dedicated detector. - Not exposed as a tool the agent or other services can subscribe to. ## Proposed fix (pick one) **Option A: upgrade `candle-core`.** Find the candle-core version range that's compatible with the rest of the workspace and unstub Rustpotter. Cleanest but may cascade into other dep upgrades. **Option B: alternative detector.** Pick a different wake-word library that doesn't depend on candle-core. Candidates worth evaluating: `porcupine` (Picovoice; commercial license), `openWakeWord` (ONNX-based), simple keyword spotting with whisper-base on a short audio window. **Option C: live with the substring fallback** but expose it cleanly via OpenRPC + MCP so other services can subscribe. Doesn't fix the latency / false-positive cost. ## Severity Medium. Not a deploy blocker (substring fallback works), but the Ambient AI vision in [hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16) and [hero_demo#52](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) leans on responsive wake — the substring path doesn't get there. ## Cross-refs - [hero_agent#16 — Ambient AI](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16) (depends on this) - [hero_demo#52 — vision](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) - TTS expectation note: TTS is Kokoro-only; the "Groq fallback" applies to STT only (relevant background for Ambient AI scoping) Spotted during docs_hero Phase 1 source-grounded read (session 52). Reconciliation memo: `memory/investigation_roadmap_reconciliation.md`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_voice#23
No description provided.