Skip to content

TRR BACKEND: Stage 6 Screenalytics ingestion (manifest-driven) #31

Description

@therealityreport

Status: Blocked until Screenalytics RunManifest + artifact references are stable.

Goal: Implement Stage 6 to ingest Screenalytics results using the manifest contract and persist product-ready summaries.

Dependencies

  • Screenalytics RunManifest schema finalized (fields + artifact refs + versioning)
  • Decide ingestion method:
    • poll S3 for {prefix}/runs/{run_id}/manifest.json, OR
    • consume Screenalytics outbox events (preferred)

TODO (once unblocked)

  • Implement trr_backend/pipeline/stages/sync_screenalytics.py
  • Add config:
    • bucket + prefix + auth strategy
  • Version gate: refuse to ingest unknown/unsupported schema_version in Screenalytics RunManifest — fail with clear error
  • Define storage in TRR (placeholder schema, can change):
    • new schema screenalytics with:
      • screenalytics.runs (run_id, manifest_key, schema_version, status, timestamps)
      • screenalytics.person_metrics (run_id, person_id, onscreen_ms, speaking_ms?, confessional_ms?, etc.)
    • OR attach to episode/person summary tables (decide based on final Screenalytics output shape)
  • Validate manifest + download summary artifacts
  • Upsert results with traceability (run_id, manifest_key, artifact_keys, schema versions)
  • Idempotency: re-ingest safe via (run_id, artifact_key) or (run_id, schema_version, artifact_key) constraints
  • Add tests for:
    • manifest parsing
    • idempotent re-ingest (no duplicate rows on replay)
    • missing/invalid artifacts → clear error in pipeline.run_stages.error_details
    • unknown schema_version rejection

Acceptance

  • Given a completed manifest in S3, Stage 6 ingests and produces deterministic DB rows
  • Re-running Stage 6 does not duplicate rows
  • Invalid manifests or unknown schema versions fail with clear error captured in pipeline.run_stages.error_details

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions