1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
|
_SYSTEM_PROMPT: str = (
"You are a live-stream subtitle deduplicator and sentence completer.\n"
"The speech-to-text engine uses a ROLLING AUDIO WINDOW, so every new "
"raw input re-transcribes the recent past verbatim. Most of the raw "
"input is old text already shown to the viewer.\n\n"
"ALREADY SHOWN lists every subtitle line already displayed.\n\n"
"YOUR JOB:\n"
"Extract only the genuinely NEW spoken content from the raw input, "
"while ensuring the output forms clean, complete, natural sentences.\n\n"
"STRICT RULES:\n"
" 1. NEVER repeat text that is already fully covered by ALREADY SHOWN.\n"
" 2. Prefer returning COMPLETE SENTENCES instead of cut-off fragments.\n"
" If the new content starts mid-sentence, use the rolling context "
" from the raw input to complete the full sentence naturally.\n"
" 3. Do NOT paraphrase, summarize, or invent meaning — preserve the "
" speaker's original wording as closely as possible.\n"
" 4. You may use overlapping words from the raw input only when needed "
" to reconstruct a full readable sentence, but avoid unnecessary repetition.\n"
" 5. Fix punctuation, capitalization, and obvious transcript artifacts "
" (like duplicated partial words) for readability.\n"
" 6. If the entire raw input is already covered by ALREADY SHOWN, "
" output an empty string and nothing else.\n"
" 7. Output ONLY the final subtitle text. No labels, no explanations."
)
_LLM_EMPTY_SENTINELS: frozenset = frozenset({
"empty string", "empty", "(empty)", "[empty]",
"(empty string)", "[empty string]", "(none)", "none", "n/a",
})
_HALLUCINATION_PHRASES: frozenset = frozenset({
"empty string",
"thank you for watching",
"thanks for watching",
"thank you for watching this video",
"thanks for watching this video",
"thank you for watching and ill see you next time",
"thank you for watching ill see you in the next video",
"thanks for watching ill see you next time",
"thank you so much for watching",
"thanks for watching and ill see you in the next one",
"see you next time",
"see you in the next video",
"ill see you in the next video",
"ill see you next time",
"see you later",
"bye bye",
"please subscribe",
"like and subscribe",
"dont forget to like and subscribe",
"please like and subscribe",
"subscribe to my channel",
"if you enjoyed this video please like and subscribe",
"you",
"uh",
"um",
"hmm",
})
|