[#77] Select multiple expected languages for mixed-language meetings#81
Merged
Conversation
b32d04c to
10569de
Compare
10569de to
7dca0bc
Compare
Replace the language checkbox group with a searchable Tom Select multi-select (vendored locally, Apache-2.0) and make the auto-detect behavior explicit with a live mode indicator: empty = auto-detect a single language, one = single language, two or more = multilingual per-passage transcription.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #77
Summary
Adds multilingual transcription support so a mixed English/Thai (or any 2+ language) meeting is transcribed passage by passage in the correct language, instead of being forced into one.
POST /api/meetingsaccepts a set ofexpected_languages; the meeting metadata records it.Out of scope for this slice (later stories): per-language word alignment, the per-segment language badge UI, and per-segment emotion analysis. Multilingual timestamps stay at chunk level and speakers are left
UNKNOWNuntil the alignment story.Approach
backend/services/multilingual_transcriber.pyreproduces WhisperX's VAD chunking and uses faster-whisper'sdetect_language(argmax over the selected set, renormalized) plus per-chunktranscribe(..., vad_filter=False). The classification policy (duration ≥ 1.5s, renormalized confidence ≥ 0.70, raw floor ≥ 0.5) is heuristic and tunable. The single- and multilingual pipelines were extracted into helper functions intranscriber.pyto keep routing small; the single-language extraction is behavior-preserving. All ML imports stay lazy so the pure classification helpers (and the module) import without whisperx/torch/numpy.Approach and scope decisions were reviewed by the Argus architect at plan stage (the speaker/diarization scope was deliberately deferred to the alignment story) and the diff passed architect code review. See
docs/plans/77-multilingual-language-selection.md.Verification
uv run pytest -q→ 269 passed, 2 skipped. New coverage:tests/unit/test_multilingual_transcriber.py(constrained detection never returns an unselected language, confidence/duration gating, duration-weighted dominant fallback, orchestrator per-segment tagging + timestamp offsetting + failure handling, VAD glue), routing matrix intests/unit/test_transcriber.py(0/1 force the single path and selected language; 2+ run multilingual with no align/diarize andUNKNOWNspeakers; audio analysis receives the dominant language and no diarize turns), schema defaults, andexpected_languagesstorage/sanitization intests/integration/test_meetings.py.uv run ruff check .anduv run ruff format --check .→ clean.