auto-live-tl

A translation backend that listens to an audio sink via PCM and produces translations in real time.
- Whisper + optional Ollama cleanup
- Runs locally via CUDA or CPU. Follow "Local Setup" instructions
- OpenAI Realtime translation using gpt-realtime-translate (requires OpenAI API Key, billed usage)
- This bypasses faster-whisper and ollama, in this use case auto-live-tl only serves to encode PCM data into the appropriate format for gpt-realtime-translate

Translations and trascriptions are transformers based, inaccuracies and hallucinations will occur.

General Setup

uv sync
uv run server.py

server.py serves a backend for translating incoming audio data. It expects some other client to hit the /events endpoint to fetch the translated data. A GUI is available for configuration

Whisper + Ollama (Local Setup)

It's highly recommended that you run this with a GPU, running with CPU is possible but inference will be very slow outside of using tiny models (which compromise accuracy)

For this, you will need to install a Nvidia CUDA 12 toolkit. I am running with CUDA Toolkit 12.9

OpenAI gpt-realtime-translate (Setup)

Enabling this option in the GUI will ignore pulling whisper and ollama models, and most settings. All data will be sent to OpenAI.

Clients:

youtube-subtitle.user.js is one such example client that can fetch data from this endpoint and render it beneath a YouTube video. You can install it as a userscript.

SSE Subtitle Server API

Generated subtitles are broadcast as server-sent-events (event stream). See the API below

`GET /health`

Simple liveness check.

Response: 200 OK, body ok (plain text)

`GET /events`

The main subtitle stream. Uses Server-Sent Events (SSE) — keep the connection open and read events as they arrive.

Response headers:
```/dev/null/example.http#L1-3
Content-Type: text/event-stream
Cache-Control: no-cache
Access-Control-Allow-Origin: *

**Event types you'll receive:**

| Type | When |
|---|---|
| `subtitle` | A new subtitle is ready |
| *(keep-alive comment)* | Every 15 s of silence, to prevent connection drops |

**`subtitle` event payload** — JSON in the `data` field:
```/dev/null/subtitle.json#L1-3
{
  "text": "The cleaned subtitle string."
}

Keep-alive lines look like:

: keep-alive

These carry no data and should be ignored.

How to connect (examples)

JavaScript:

const source = new EventSource("http://127.0.0.1:5000/events");

source.addEventListener("subtitle", (event) => {
  const { text } = JSON.parse(event.data);
  console.log(text);
});

Python:

import sseclient, requests

resp = requests.get("http://127.0.0.1:5000/events", stream=True)
client = sseclient.SSEClient(resp)
for event in client.events():
    if event.event == "subtitle":
        import json
        print(json.loads(event.data)["text"])

Demo

Ran using faster-whisper medium and qwen2.5-7B-instruct on RTX 3060 Mobile (CUDA)

Example 1:

https://github.com/user-attachments/assets/db602a11-2d13-4e58-a5e8-1d4a71c1be0e

Example 2:

https://github.com/user-attachments/assets/a480809e-77f7-4b66-9686-aa2ffea8333d