**Status:** Blocked until Screenalytics `RunManifest` + artifact references are stable. **Goal:** Implement Stage 6 to ingest Screenalytics results using the manifest contract and persist product-ready summaries. ## Dependencies - [ ] Screenalytics RunManifest schema finalized (fields + artifact refs + versioning) - [ ] Decide ingestion method: - [ ] poll S3 for `{prefix}/runs/{run_id}/manifest.json`, OR - [ ] consume Screenalytics outbox events (preferred) ## TODO (once unblocked) - [ ] Implement `trr_backend/pipeline/stages/sync_screenalytics.py` - [ ] Add config: - [ ] bucket + prefix + auth strategy - [ ] **Version gate**: refuse to ingest unknown/unsupported `schema_version` in Screenalytics RunManifest — fail with clear error - [ ] Define storage in TRR (placeholder schema, can change): - [ ] new schema `screenalytics` with: - [ ] `screenalytics.runs` (run_id, manifest_key, schema_version, status, timestamps) - [ ] `screenalytics.person_metrics` (run_id, person_id, onscreen_ms, speaking_ms?, confessional_ms?, etc.) - [ ] OR attach to episode/person summary tables (decide based on final Screenalytics output shape) - [ ] Validate manifest + download summary artifacts - [ ] Upsert results with traceability (`run_id`, `manifest_key`, `artifact_keys`, schema versions) - [ ] **Idempotency**: re-ingest safe via `(run_id, artifact_key)` or `(run_id, schema_version, artifact_key)` constraints - [ ] Add tests for: - [ ] manifest parsing - [ ] idempotent re-ingest (no duplicate rows on replay) - [ ] missing/invalid artifacts → clear error in `pipeline.run_stages.error_details` - [ ] unknown schema_version rejection ## Acceptance - [ ] Given a completed manifest in S3, Stage 6 ingests and produces deterministic DB rows - [ ] Re-running Stage 6 does not duplicate rows - [ ] Invalid manifests or unknown schema versions fail with clear error captured in `pipeline.run_stages.error_details`
Status: Blocked until Screenalytics
RunManifest+ artifact references are stable.Goal: Implement Stage 6 to ingest Screenalytics results using the manifest contract and persist product-ready summaries.
Dependencies
{prefix}/runs/{run_id}/manifest.json, ORTODO (once unblocked)
trr_backend/pipeline/stages/sync_screenalytics.pyschema_versionin Screenalytics RunManifest — fail with clear errorscreenalyticswith:screenalytics.runs(run_id, manifest_key, schema_version, status, timestamps)screenalytics.person_metrics(run_id, person_id, onscreen_ms, speaking_ms?, confessional_ms?, etc.)run_id,manifest_key,artifact_keys, schema versions)(run_id, artifact_key)or(run_id, schema_version, artifact_key)constraintspipeline.run_stages.error_detailsAcceptance
pipeline.run_stages.error_details