No description
  • Rust 66.9%
  • JavaScript 22.4%
  • HTML 7.1%
  • CSS 2.1%
  • CMake 1.1%
  • Other 0.3%
Find a file
zaelgohary 9a42dcb5ee
Some checks failed
Build / build (push) Failing after 19s
fix: migrate /voice-control/*path to axum 0.8 wildcard syntax
2026-05-10 16:23:36 +02:00
.cargo build: update local Cargo config for monorepo dev 2026-02-24 12:19:00 +02:00
.forgejo/workflows fix(ci): aarch64 cross-compile OpenSSL fix 2026-05-05 18:06:00 -04:00
.hero chore: update hero_ deps to 0.6.0 and add hero_builder artifacts 2026-05-10 14:36:23 +02:00
crates fix: migrate /voice-control/*path to axum 0.8 wildcard syntax 2026-05-10 16:23:36 +02:00
docs/schemas fix: resolve cargo check failures for CI 2026-02-24 13:07:17 +02:00
schemas/voice feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
wasm/kws-vad Move STT/TTS to sherpa-onnx fork, add kws+vad wasm scaffold 2026-05-07 01:16:48 +00:00
.gitignore Ignore nested cargo target dirs and wasm build dirs 2026-05-07 01:16:48 +00:00
Cargo.lock chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
Cargo.toml chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
Cargo.toml.hero_builder_backup chore: update hero_ deps to 0.6.0 and add hero_builder artifacts 2026-05-10 14:36:23 +02:00
LICENSE feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
PURPOSE.md fix: logging compliance, socket naming, add PURPOSE.md 2026-05-07 12:58:12 +02:00
README.md fix: logging compliance, socket naming, add PURPOSE.md 2026-05-07 12:58:12 +02:00

Hero Voice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

  • Real-time voice transcription - Stream audio from browser to server via WebSocket
  • Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
  • Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
  • AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
  • Live markdown preview - Split-screen editor with real-time rendered HTML
  • Text transformations - 14 built-in AI transformation styles:
    • spellcheck - Grammar and spelling correction
    • specs - Technical specifications
    • code - Software architecture documentation
    • docs - User-friendly documentation
    • legal - Legal document formatting
    • story - Creative narrative
    • summary - Bullet-point summary
    • technical - Technical documentation
    • business - Business analysis
    • meeting - Meeting minutes
    • email - Professional email
    • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Requirements

  • Rust 1.92+
  • Groq API key (required for transcription)
  • Modern browser with Web Audio API and microphone support

Configuration

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export RUST_LOG=hero_voice=info  # Log level

Usage

service voice start --update --reset

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Service Socket Path
Server (OpenRPC) ~/hero/var/sockets/hero_voice/rpc.sock
UI (HTTP + /rpc proxy + WebSocket) ~/hero/var/sockets/hero_voice/web.sock

hero_voiced — local OpenAI-compatible STT/TTS daemon

hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and Kokoro (TTS) once and exposes them over an OpenAI-compatible API. It's designed to be auto-discovered and registered as a priority-0 backend by hero_aibroker, so that any consumer using herolib_ai against the broker gets local inference for free, with cloud fallback (Groq, etc.) handled by the broker — not the daemon.

Endpoints:

  • POST /v1/audio/transcriptions — multipart form (file, model, language, prompt, response_format). Default response {"text": "..."}.
  • POST /v1/audio/speech — JSON {model, input, voice, response_format, speed}. Supports response_format of wav (default) and pcm.
  • GET /v1/models — local engine identifiers.
  • GET /health{status, service, version, models_ready}.
  • GET /.well-known/heroservice.json — discovery manifest.

Environment:

Var Default Purpose
HERO_VOICED_PORT 8094 Loopback TCP port
HERO_VOICED_ADDRESS (unset) Optional second bind (e.g. mycelium IPv6)
HERO_VOICE_STT_SHERPA_DIR ~/hero/share/hero_voice/stt/parakeet Parakeet bundle dir
HERO_VOICE_TTS_KOKORO_DIR ~/hero/share/hero_voice/kokoro-en-v0_19 Kokoro bundle dir

Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined download from the sherpa-onnx GitHub releases). make parakeet-deps / make tts-deps / make model-deps remain available for offline pre-bake on images and CI.

Run standalone:

make voiced
# or
cargo run -p hero_voiced

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│   ├── hero_voice/              # Core library (types, domain logic, audio, transcription)
│   ├── hero_voice_server/       # JSON-RPC 2.0 server over Unix socket
│   ├── hero_voice_sdk/          # Generated client SDK
│   ├── hero_voice_ui/           # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│   └── hero_voice_examples/     # Example programs using the SDK
├── schemas/voice/voice.oschema  # Domain schema (source of truth)
├── data/                        # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Data flow

Browser (WebSocket)
    │
    ▼
hero_voice_ui (Unix socket)
    ├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
    ├── /mcp endpoint → MCP-to-OpenRPC translation
    ├── /ws  endpoint → WebSocket audio streaming
    └── /*   fallback → embedded static assets

hero_voice_server (Unix socket)
    ├── rpc.health   → {"status":"ok"}
    ├── rpc.discover → OpenRPC spec
    └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.

Auto-generated CRUD (Topic and Folder root objects):

  • topic.new, topic.get, topic.set, topic.delete, topic.list
  • folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

  • voiceservice.create_topic / voiceservice.create_folder
  • voiceservice.rename_topic / voiceservice.rename_folder
  • voiceservice.move_topic / voiceservice.move_folder
  • voiceservice.delete_topic / voiceservice.delete_folder
  • voiceservice.save_content / voiceservice.transform_content
  • voiceservice.register_audio / voiceservice.delete_audio
  • voiceservice.reset_topic / voiceservice.get_audio_path

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

  • GET /files/audio/{filename} - Audio file downloads
  • GET /files/transforms/{filename} - Transform file downloads

Audio Processing

  • Sample rate: 16kHz (required for Silero VAD)
  • Chunk size: 512 samples for VAD analysis
  • Silence threshold: 350ms triggers transcription
  • Speech threshold: 0.20 probability
  • Maximum buffer: 30 seconds before forced transcription
  • Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 17+
  • Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0