No description

Rust 66.9%
JavaScript 22.4%
HTML 7.1%
CSS 2.1%
CMake 1.1%
Other 0.3%

Find a file

zaelgohary 9a42dcb5ee Some checks failed Build / build (push) Failing after 19s Details fix: migrate /voice-control/*path to axum 0.8 wildcard syntax		2026-05-10 16:23:36 +02:00
.cargo	build: update local Cargo config for monorepo dev	2026-02-24 12:19:00 +02:00
.forgejo/workflows	fix(ci): aarch64 cross-compile OpenSSL fix	2026-05-05 18:06:00 -04:00
.hero	chore: update hero_ deps to 0.6.0 and add hero_builder artifacts	2026-05-10 14:36:23 +02:00
crates	fix: migrate /voice-control/*path to axum 0.8 wildcard syntax	2026-05-10 16:23:36 +02:00
docs/schemas	fix: resolve cargo check failures for CI	2026-02-24 13:07:17 +02:00
schemas/voice	feat: align hero_voice workspace structure with hero_services reference pattern	2026-02-24 11:16:17 +02:00
wasm/kws-vad	Move STT/TTS to sherpa-onnx fork, add kws+vad wasm scaffold	2026-05-07 01:16:48 +00:00
.gitignore	Ignore nested cargo target dirs and wasm build dirs	2026-05-07 01:16:48 +00:00
Cargo.lock	chore: migrate to hero_admin_lib shared assets	2026-05-10 16:19:59 +02:00
Cargo.toml	chore: migrate to hero_admin_lib shared assets	2026-05-10 16:19:59 +02:00
Cargo.toml.hero_builder_backup	chore: update hero_ deps to 0.6.0 and add hero_builder artifacts	2026-05-10 14:36:23 +02:00
LICENSE	feat: align hero_voice workspace structure with hero_services reference pattern	2026-02-24 11:16:17 +02:00
PURPOSE.md	fix: logging compliance, socket naming, add PURPOSE.md	2026-05-07 12:58:12 +02:00
README.md	fix: logging compliance, socket naming, add PURPOSE.md	2026-05-07 12:58:12 +02:00

README.md

Hero Voice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

Real-time voice transcription - Stream audio from browser to server via WebSocket
Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
Live markdown preview - Split-screen editor with real-time rendered HTML
Text transformations - 14 built-in AI transformation styles:
- spellcheck - Grammar and spelling correction
- specs - Technical specifications
- code - Software architecture documentation
- docs - User-friendly documentation
- legal - Legal document formatting
- story - Creative narrative
- summary - Bullet-point summary
- technical - Technical documentation
- business - Business analysis
- meeting - Meeting minutes
- email - Professional email
- Language translations: Dutch, French, Arabic
Topic organization - Hierarchical folder structure for transcriptions
Audio archival - Saves recordings as WAV and compressed OGG

Requirements

Rust 1.92+
Groq API key (required for transcription)
Modern browser with Web Audio API and microphone support

Configuration

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export RUST_LOG=hero_voice=info  # Log level

Usage

service voice start --update --reset

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Service	Socket Path
Server (OpenRPC)	`~/hero/var/sockets/hero_voice/rpc.sock`
UI (HTTP + /rpc proxy + WebSocket)	`~/hero/var/sockets/hero_voice/web.sock`

hero_voiced — local OpenAI-compatible STT/TTS daemon

hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and Kokoro (TTS) once and exposes them over an OpenAI-compatible API. It's designed to be auto-discovered and registered as a priority-0 backend by hero_aibroker, so that any consumer using herolib_ai against the broker gets local inference for free, with cloud fallback (Groq, etc.) handled by the broker — not the daemon.

Endpoints:

POST /v1/audio/transcriptions — multipart form (file, model, language, prompt, response_format). Default response {"text": "..."}.
POST /v1/audio/speech — JSON {model, input, voice, response_format, speed}. Supports response_format of wav (default) and pcm.
GET /v1/models — local engine identifiers.
GET /health — {status, service, version, models_ready}.
GET /.well-known/heroservice.json — discovery manifest.

Environment:

Var	Default	Purpose
`HERO_VOICED_PORT`	`8094`	Loopback TCP port
`HERO_VOICED_ADDRESS`	(unset)	Optional second bind (e.g. mycelium IPv6)
`HERO_VOICE_STT_SHERPA_DIR`	`~/hero/share/hero_voice/stt/parakeet`	Parakeet bundle dir
`HERO_VOICE_TTS_KOKORO_DIR`	`~/hero/share/hero_voice/kokoro-en-v0_19`	Kokoro bundle dir

Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined download from the sherpa-onnx GitHub releases). make parakeet-deps / make tts-deps / make model-deps remain available for offline pre-bake on images and CI.

Run standalone:

make voiced
# or
cargo run -p hero_voiced

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│   ├── hero_voice/              # Core library (types, domain logic, audio, transcription)
│   ├── hero_voice_server/       # JSON-RPC 2.0 server over Unix socket
│   ├── hero_voice_sdk/          # Generated client SDK
│   ├── hero_voice_ui/           # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│   └── hero_voice_examples/     # Example programs using the SDK
├── schemas/voice/voice.oschema  # Domain schema (source of truth)
├── data/                        # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Data flow

Browser (WebSocket)
    │
    ▼
hero_voice_ui (Unix socket)
    ├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
    ├── /mcp endpoint → MCP-to-OpenRPC translation
    ├── /ws  endpoint → WebSocket audio streaming
    └── /*   fallback → embedded static assets

hero_voice_server (Unix socket)
    ├── rpc.health   → {"status":"ok"}
    ├── rpc.discover → OpenRPC spec
    └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.

Auto-generated CRUD (Topic and Folder root objects):

topic.new, topic.get, topic.set, topic.delete, topic.list
folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

voiceservice.create_topic / voiceservice.create_folder
voiceservice.rename_topic / voiceservice.rename_folder
voiceservice.move_topic / voiceservice.move_folder
voiceservice.delete_topic / voiceservice.delete_folder
voiceservice.save_content / voiceservice.transform_content
voiceservice.register_audio / voiceservice.delete_audio
voiceservice.reset_topic / voiceservice.get_audio_path

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

GET /files/audio/{filename} - Audio file downloads
GET /files/transforms/{filename} - Transform file downloads

Audio Processing

Sample rate: 16kHz (required for Silero VAD)
Chunk size: 512 samples for VAD analysis
Silence threshold: 350ms triggers transcription
Speech threshold: 0.20 probability
Maximum buffer: 30 seconds before forced transcription
Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

Chrome 120+
Firefox 120+
Safari 17+
Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0