Skip to content

TRR BACKEND: Validate cast/image views for duplicates + performance #33

Description

@therealityreport

Goal: Confirm whether core.v_episode_cast, core.v_season_cast, and core.v_person_images produce duplicates or need performance tuning.

Note: Only pursue if duplicates are actually observed.

TODO

  • Run data reality check SQL snippets:
    -- Episode cast duplicates by (episode_id, person_id, credit_category)
    SELECT episode_id, person_id, credit_category, COUNT(*)
    FROM core.v_episode_cast
    GROUP BY episode_id, person_id, credit_category
    HAVING COUNT(*) > 1;
    
    -- Season cast duplicates by (season_id, person_id)
    SELECT season_id, person_id, COUNT(*)
    FROM core.v_season_cast
    GROUP BY season_id, person_id
    HAVING COUNT(*) > 1;
    
    -- Person images duplicates by (person_id, media_asset_id, kind)
    SELECT person_id, media_asset_id, kind, COUNT(*)
    FROM core.v_person_images
    GROUP BY person_id, media_asset_id, kind
    HAVING COUNT(*) > 1;
  • If duplicates exist and are problematic:
    • update views with SELECT DISTINCT and document why
    • or add constraints/indexes where appropriate
  • If duplicates exist but are harmless:
    • document that consumer must dedupe and close the issue
  • Add lightweight EXPLAIN plans + indexes if queries are slow

Acceptance

  • Views are correct for Screenalytics candidate selection and perform acceptably on production data
  • Duplicate behavior is documented (either fixed or consumer-responsibility)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions