v2.0.9 · Tags · tools / speech-to-text

v2.0.9 protected

pborisov1981@gmail.com

838073fe · docs: Cascade AO mode guide + README updates for v2.0.9 · May 22, 2026

v2.0.9 — Cascade AO router + meeting diarization

ASR backends:
- New 'cascade' backend (Stage AO): Whisper-AI primary + GigaAM fallback
  per-call when avg_logprob ≤ -0.20. Eliminates Whisper subtitle-credit
  hallucinations. Bench: 9.43% WER vs 10.16% Whisper-alone (-0.73 pp).
- Restore 'large-v3-turbo-russian' (Stage AI fine-tune).
- GigaAM model selector restored.

Progressive transcription:
- VAD-aligned 10s window with 1.5s update cadence + honest gating.
- Eliminates mid-word window cuts and silence hallucinations.

Streaming dedup:
- Levenshtein + partial-match (≥80% words + first-word anchor) +
  substring search for shifted overlaps. Catches Whisper's
  morphological vacillation (упало↔упала) and one-word substitutions
  (рак↔рот) in overlap regions.

Stage N retranscribe:
- Re-transcribes each pyannote turn from pre-amped audio for clean
  punctuation/capitalisation. Pre-turn gates drop wrong-language and
  low-confidence turns.

Language correction:
- Majority-vote at finalize only re-flips LOW-confidence (lang_prob<0.5)
  outliers — preserves high-confidence EN code-switching segments.

Hot-reload safety:
- Backend/model swap now respawns process silently — works around
  CTranslate2 CUDA-context abort on hot-reload.

Docs:
- New docs/guides/CASCADE_AO_MODE.md with meeting/interview use cases.
- README + README_ru updated for 5 backends + diarization.