[#80] Get emotion results for English portions of mixed-language meetings by julien731 · Pull Request #88 · nimblehq/audio-transcriber

julien731 · 2026-06-10T08:46:14Z

Closes #80

Summary

Emotion analysis previously gated on the meeting's single detected language, skipping the entire meeting whenever detected_language != "en". For a bilingual (English/Thai) meeting this silently discarded all emotional insight, including the English portions.

This change replaces the whole-meeting gate with per-segment gating in _run_emotion_analysis:

English segments are sent to the SER model and receive emotion annotations.
Segments in a language the emotion model does not support are recorded as EmotionUnavailable markers instead of being silently dropped.
The meeting is no longer skipped wholesale just because it contains some non-English speech (EC-6, BR-10).
Prosody and interaction analysis remain language-agnostic and untouched (BR-11, AC3).

A segment with no per-segment language (single-language path and pre-feature transcripts) falls back to the meeting's detected language, preserving today's behavior exactly (EC-8).

Approach

Added EmotionUnavailable(segment_id, reason) to backend/schemas.py and an emotion_unavailable list on AudioAnalysis, mirroring the existing ProsodyUnavailable precedent (data-only markers; no new frontend pattern introduced).
Introduced SUPPORTED_LANGUAGES = frozenset({"en"}) in emotion_analyzer.py as the single source of truth for emotion-supported languages (a non-English SER model is out of scope per OQ-4).
_run_emotion_analysis partitions segments by effective language seg.language or detected_language; when no segment is supported it returns UNAVAILABLE (never FAILED) so prosody and interaction still run (AC2).
Updated the upload disclosure copy to describe the new partial-analysis behavior.

Verification

pytest tests/unit — 250 passed (includes new cases: mixed meeting, no-English meeting reports unavailable-not-failed, sorted multi-language aggregate reason, and unavailable markers surviving an SER exception).
ruff check and ruff format --check — clean.
Argus architect plan review and code review both passed with no Critical/Major findings.

julien731 · 2026-06-10T08:49:00Z

QA Confidence Verdict — Story #80 (PR #88)

Verdict: PASS (high confidence). All 3 truths and all 3 acceptance criteria are implemented and covered by tests. 250 passed in tests/unit. No application code modified during QA (read-only verification).

What Was Verified

Truths

T1 — Emotion runs per segment, gated on the segment's language (BR-10): PASS (code + tests). _run_emotion_analysis partitions segment_models by segment.language or detected_language against SUPPORTED_LANGUAGES = frozenset({"en"}); supported segments go to the SER model, others get EmotionUnavailable markers. Proven by test_mixed_meeting_analyzes_english_and_marks_others_unavailable (only seg_0000 reaches analyze_segments; seg_0001/th marked unavailable).
T2 — A meeting is never skipped for emotion solely because it contains some non-English speech (BR-10): PASS (code + tests). The old if detected_language != "en": return UNAVAILABLE, [], ... early-out is gone; wholesale skip now only happens when the supported list is empty. Same test above confirms emotion_status COMPLETED for a mixed meeting.
T3 — Prosody and interaction outputs are independent of per-segment language (BR-11): PASS (code + tests). _run_prosody_analysis / _run_interaction_analysis receive the full segment_models and take no language argument; unchanged by this PR. Asserted COMPLETED across the mixed, no-English, and English paths.

Acceptance Criteria

AC1 (mixed meeting → English annotated, non-English marked unavailable, not skipped): PASS. test_mixed_meeting_analyzes_english_and_marks_others_unavailable.
AC2 (no English segments → emotion UNAVAILABLE not FAILED; prosody + interaction still run): PASS. test_no_english_segments_reports_unavailable_not_failed (emotion UNAVAILABLE, reason language_not_supported:th, prosody/interaction COMPLETED, overall COMPLETED, mock_prosody.assert_called_once()). _roll_up_status correctly keeps overall COMPLETED when emotion is UNAVAILABLE but other stages produced output.
AC3 (prosody + interaction run identically regardless of per-segment language, BR-11): PASS. Verified across all three new tests plus pre-existing test_unsupported_language_skips_emotion_but_runs_prosody.

Supporting checks

Schema: EmotionUnavailable {segment_id, reason} + AudioAnalysis.emotion_unavailable default []. Covered by test_audio_analysis_default_emotion_unavailable_empty and test_audio_analysis_with_emotion_unavailable.
EC-8 backward-compat (segments with no per-segment language fall back to meeting language): PASS. _segments() helper carries no language key; test_english_runs_both_stages_and_returns_completed (detected_language=en → COMPLETED) and test_unsupported_language_skips_emotion_but_runs_prosody (detected_language=fr → UNAVAILABLE) exercise the or detected_language fallback for both single-language and pre-feature transcripts.
Partial-failure resilience: test_emotion_failure_still_propagates_unavailable_markers confirms markers partitioned before the SER call survive a model crash (status FAILED, markers intact).
Aggregate reason format: test_no_english_aggregate_reason_lists_languages_sorted pins sorted multi-language reason (language_not_supported:fr,th).
Call-site integrity: _run_emotion_analysis (now a 4-tuple) has exactly one caller, _run_audio_analysis, which was updated; the external AudioAnalysis return shape is unchanged, so the line-619 production call site is unaffected.

Verification was programmatic (code inspection + unit suite). No Playwright UI drive was performed: this is a backend data-pipeline change whose only user-facing surface is a static copy edit (see below). No running app instance was required.

What Needs Human Eyes

Upload disclosure copy (frontend/js/components/upload.js): wording changed to "Emotion analysis currently supports English only. In a mixed-language meeting, the English portions are still analyzed and other-language passages are marked unavailable. Prosody and interaction signals run for every language." Quick PM/copy glance to confirm tone/accuracy. No automated copy assertion.
End-user visibility of emotion_unavailable: the new markers are data-only and are NOT rendered anywhere in the frontend (neither is the existing prosody_unavailable — grep found zero consumers). This matches the stated "mirroring prosody_unavailable" design, but if PM expects mixed-meeting users to see which segments were skipped for emotion, that UI does not exist yet. Flag for product confirmation that data-only is the intended scope for this story.

Risk Areas

SER model behavior is mocked in all tests (analyze_segments patched). The partitioning/gating logic is fully verified, but real per-segment emotion quality on English portions of a genuinely mixed meeting is not exercised here — depends on the real firdhokk/...whisper-large-v3 model and per-segment language detection upstream (multilingual pipeline). Recommend one real-audio smoke run.
Per-segment language provenance: correctness depends on the multilingual pipeline (stories As an uploader, I can select more than one expected language for a meeting so that a mixed English/Thai recording is transcribed in the correct language passage by passage #77/As an uploader of a mixed meeting, I can trust that timestamps and speaker assignment stay accurate across language switches so that clicking any passage jumps to the right audio and the right speaker is credited #78/As an uploader, I can see which language each segment was detected as so that I can quickly spot and judge any mis-detected passages #79) populating TranscriptSegment.language accurately; mislabeled segments would route emotion incorrectly. Out of scope for As an uploader of a mixed meeting with emotion analysis enabled, I can still get emotion results for the English portions so that enabling it does not silently discard all emotional insight #80 but worth a confirmation in an integration run.

Suggested QA Focus

Quick glance (≈2 min): upload disclosure copy.
Thorough (≈10–15 min): one real mixed English/non-English audio file through the full pipeline with emotion enabled — confirm English segments get emotion annotations, non-English segments appear in emotion_unavailable, and the meeting reaches READY (not ERROR). Confirm with PM whether per-segment unavailability needs a visible UI affordance or remains data-only.

julien731 added 5 commits June 10, 2026 15:39

[#80] Add implementation plan for per-segment emotion gating

45209a5

[#80] Add EmotionUnavailable schema and SUPPORTED_LANGUAGES constant

4bd5747

[#80] Gate emotion analysis per segment by detected language

08b7264

[#80] Clarify emotion analysis copy for mixed-language meetings

6b0ee37

[#80] Record plan deviations (none)

a1e19f9

julien731 added the feature New feature or enhancement label Jun 10, 2026

julien731 self-assigned this Jun 10, 2026

julien731 merged commit 04e6236 into main Jun 10, 2026
3 checks passed

julien731 deleted the feature/80-multilingual-emotion-per-segment branch June 10, 2026 10:47

julien731 mentioned this pull request Jun 10, 2026

Assess FunASR #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[#80] Get emotion results for English portions of mixed-language meetings#88

[#80] Get emotion results for English portions of mixed-language meetings#88
julien731 merged 5 commits into
mainfrom
feature/80-multilingual-emotion-per-segment

julien731 commented Jun 10, 2026

Uh oh!

julien731 commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

julien731 commented Jun 10, 2026

Summary

Approach

Verification

Uh oh!

julien731 commented Jun 10, 2026

QA Confidence Verdict — Story #80 (PR #88)

What Was Verified

What Needs Human Eyes

Risk Areas

Suggested QA Focus

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant