Live translation feature

scott commented

2026-06-24 01:06:16 +00:00

I've implemented a backend for doing live translation that's based on Hero Voice and the Hero Voice provider. It's ready to implement into the app with the following notes:

Deploy both hero_voice_provider and hero_voice from the development branches
Set the voice provider to use Parakeet v3 for multilingual support
Use the hero_voice web socket end point and request translations to the target languages

The input language will be auto-detected by the voice provider and then Hero Voice will request translations to the other target languages via LLM calls using OpenRouter.

For our target languages, I found that Mistral v4 small has good performance in terms of both speed and accuracy. It's an open model made in Europe as well, so that can be nice for our narrative here in terms of sovereignty. This is a model we can host on our own infrastructure when we have the proper capacity.

One caveat for now is that this is depending on pauses from the speaker in order to segment the audio. This works pretty well for a live or real-time feeling, assuming that the speaker provides adequate pauses. However, if we feel it's not good enough, then I do have some ideas about how we can make it more responsive.

Added scope — live translation for presentations (on-stage)

Beyond the meetings/in-person recorder above, we also need live translation for presentations:

LiveKit to stream the speaker's voice/audio (from stage mics) into the app along with translations.
Stream audio into the app from mics. Jaques will provide info on this.
Show English on the big screen + each attendee's own language on their phone.
Depends on the Hero Voice stack above working.

(Folded in from the conference-organiser action items, 2026-06-24. P0.)

I've implemented a backend for doing live translation that's based on Hero Voice and the Hero Voice provider. It's ready to implement into the app with the following notes: 1. Deploy both hero_voice_provider and hero_voice **from the development branches** 2. Set the voice provider to use Parakeet v3 for multilingual support 3. Use the hero_voice web socket end point and request translations to the target languages The input language will be auto-detected by the voice provider and then Hero Voice will request translations to the other target languages via LLM calls using OpenRouter. For our target languages, I found that Mistral v4 small has good performance in terms of both speed and accuracy. It's an open model made in Europe as well, so that can be nice for our narrative here in terms of sovereignty. This is a model we can host on our own infrastructure when we have the proper capacity. One caveat for now is that this is depending on pauses from the speaker in order to segment the audio. This works pretty well for a live or real-time feeling, assuming that the speaker provides adequate pauses. However, if we feel it's not good enough, then I do have some ideas about how we can make it more responsive. --- ## Added scope — live translation for **presentations** (on-stage) Beyond the meetings/in-person recorder above, we also need live translation for **presentations**: - **LiveKit** to stream the speaker's voice/audio (from stage mics) into the app along with translations. - **Stream audio into the app from mics.** _Jaques will provide info on this._ - Show **English on the big screen** + **each attendee's own language on their phone**. - Depends on the Hero Voice stack above working. _(Folded in from the conference-organiser action items, 2026-06-24. **P0**.)_

❤️ 1

timur commented

2026-06-24 09:55:29 +00:00

Owner

Wired live translation into the cm50 Meetings recorder

The live-translation backend from this issue is now integrated into the Meetings → In-person recorder (development branch), as a hybrid live + offline-durable recorder.

Live translation (thin client, browser → hero_voice directly)

The recorder opens the streaming WS straight to the deployed hero_voice_web and sends a config frame with translate_to = the app's UI languages (en, es, fr); the server skips the detected source language automatically. Default translation_model = mistralai/mistral-small-2603.
Source transcript segments + per-language translations stream into the recorder UI live as the speaker pauses.
WS URL is configurable via window.CM50_VOICE_WS_URL; default is the ebook's canonical path /hero_voice/rpc/api/voice/transcribe/ws (note: the shipped widget's hard-coded /hero_voice/rest/... path is stale).
We did not reuse the served components.js widget for this, because its createStreamingRecorder does not yet send translate_to (and has no offline support). cm50 uses a small own client speaking the documented protocol.

Offline durability (survives an internet cut)

Audio is captured to IndexedDB in ~2 s chunks as it records, so a reload, crash, or dropped connection never loses the recording.
On stop while online, the authoritative transcript + translations are persisted via a new meeting_record endpoint (server only summarises — no redundant re-transcription).
If offline, the audio is queued locally and auto-uploaded to the batch meeting_transcribe endpoint when connectivity returns.
MeetingRecording gained source_language + translations[].

Deployment prerequisites (per this issue)

Deploy hero_voice + hero_voice_provider from their development branches.
Parakeet v3 caveat: the hero_voice code currently uses parakeet-v2; parakeet-v3 is a provider/model-config item, not yet referenced in code. The feature works today on v2 (source-lang auto-detect + Mistral translation); v3 will improve source ASR.

Proposed follow-up (reusable): add the IndexedDB offline-durability layer (and translate_to passthrough) into the hero_voice widget's components.js, so other apps get offline recording + translation for free. Happy to open that PR next.

### Wired live translation into the cm50 Meetings recorder The live-translation backend from this issue is now integrated into the **Meetings → In-person recorder** (`development` branch), as a **hybrid live + offline-durable** recorder. **Live translation (thin client, browser → hero_voice directly)** - The recorder opens the streaming WS straight to the deployed `hero_voice_web` and sends a `config` frame with `translate_to` = the app's UI languages (`en`, `es`, `fr`); the server skips the detected source language automatically. Default `translation_model` = `mistralai/mistral-small-2603`. - Source transcript segments + per-language translations stream into the recorder UI live as the speaker pauses. - WS URL is configurable via `window.CM50_VOICE_WS_URL`; default is the ebook's canonical path `/hero_voice/rpc/api/voice/transcribe/ws` (note: the shipped widget's hard-coded `/hero_voice/rest/...` path is stale). - We did **not** reuse the served `components.js` widget for this, because its `createStreamingRecorder` does not yet send `translate_to` (and has no offline support). cm50 uses a small own client speaking the documented protocol. **Offline durability (survives an internet cut)** - Audio is captured to **IndexedDB** in ~2 s chunks as it records, so a reload, crash, or dropped connection never loses the recording. - On stop while online, the authoritative transcript + translations are persisted via a new `meeting_record` endpoint (server only summarises — no redundant re-transcription). - If offline, the audio is queued locally and auto-uploaded to the batch `meeting_transcribe` endpoint when connectivity returns. - `MeetingRecording` gained `source_language` + `translations[]`. **Deployment prerequisites (per this issue)** 1. Deploy `hero_voice` + `hero_voice_provider` from their `development` branches. 2. **Parakeet v3 caveat:** the hero_voice code currently uses `parakeet-v2`; `parakeet-v3` is a provider/model-config item, not yet referenced in code. The feature works today on v2 (source-lang auto-detect + Mistral translation); v3 will improve source ASR. **Proposed follow-up (reusable):** add the IndexedDB offline-durability layer (and `translate_to` passthrough) into the hero_voice widget's `components.js`, so other apps get offline recording + translation for free. Happy to open that PR next.