This guide explains the full pipeline, what each script does, inputs/outputs, key arguments, environment variables, and how to run everything end-to-end. Use this as the source when unifying the pipeline into a single workflow.
- Summarize OCR pages → Chunk summaries
- Script:
summarize_chunks.py - Reads OCR Markdown pages, tokenizes into overlapping chunks, produces per-chunk JSON summaries and a chunk index.
- Script:
- Create governance card
- Script:
make_paper_card.py - Builds a “paper card” that encodes TL;DR, contributions, section order, etc., guiding planning.
- Script:
- Plan slide sections
- Script:
plan_slides.py - Sequentially plans sections (Overview → Method → Results → …), with references to chunks and figure IDs filtered to those references.
- Script:
- Generate slides
- Script:
generate_slides.py - Produces slide-level JSON with Title, Content, Audio, Bridges, and Figures. Optionally prepends a cover slide extracted from OCR via LLM.
- Script:
- Render PNG slides (Marp)
- Script:
generate_slide_pngs.py - Converts
presentation.jsonintodeck.mdanddeck.XXX.pngimages.
- Script:
- Generate audio (TTS)
- Script:
generate_audio.py - Generates per-slide narration WAV/MP3 files (per-sentence TTS composition to avoid API issues).
- Script:
- Stitch video
- Script:
stitch_video.py - Combines PNGs and audio into a single MP4.
- Script:
- OCR input directory (default):
mistral_responses/<paper_name>/markdown/*.md - Images (if referenced):
mistral_responses/<paper_name>/images/* - Artifacts directory (per paper recommended):
artifacts/<paper_name>/chunk_summaries.jsonlchunk_index.jsonlpaper_card.jsonslide_plan.jsonpresentation.jsondeck.mdpngs/(rendered PNG slides)audio/(narration WAV/MP3s)video.mp4(final stitched output)
OPENROUTER_API_KEY— required for all OpenRouter LLM calls.- Optional model overrides (comma-separated for fallback):
OPENROUTER_MODEL(global fallback)OPENROUTER_SUMMARIZE_MODEL,OPENROUTER_EXTRACTOR_MODELOPENROUTER_CARD_MODELOPENROUTER_PLANNER_MODELOPENROUTER_GENERATOR_MODELOPENROUTER_COVER_MODEL
SARVAM_API_KEY— required bygenerate_audio.pyfor TTS.
Notes:
- Defaults prefer non-
:freemodel slugs for reliability. - Token limits are uncapped by default; pass
--max-tokensonly when you need to constrain cost.
- Purpose:
- Split OCR Markdown into token chunks and generate structured JSON per chunk using OpenRouter.
- Inputs:
--pages-dir(required): path to markdown pages.
- Outputs (to
--outdir, defaultartifacts/):chunk_summaries.jsonl— one JSON object per chunk with fields:id,page,gist,claims,figs,eqs,key_terms,anchors.chunk_index.jsonl— maps chunk IDs to original page spans for later text retrieval.
- Key args:
--outdir(default:artifacts)--chunk-size(default: 1024),--overlap(default: 128)--model(default:meta-llama/llama-3.2-3b-instruct,google/gemma-2-9b-it)--max-tokens(default: unlimited)--force,--verbose
- Notable behavior:
- JSON extractor prompt includes the original
chunk_textso the model can detect explicitFigure/Table Nmentions. - Validates figure mentions by number against page-local allowed figures, populating
figsreliably. - Internal
OpenRouterClientonly attachesmax_tokensif non-None.
- JSON extractor prompt includes the original
Example:
python summarize_chunks.py \
--pages-dir mistral_responses/test_paper/markdown \
--outdir artifacts/test_paper \
--verbose --force- Purpose:
- Create
paper_card.jsonto guide planning with TL;DR, contributions, section order, etc.
- Create
- Inputs:
--artifacts-dircontainingchunk_summaries.jsonl.
- Outputs:
paper_card.jsonin--outdir.
- Key args:
--model(default viaOPENROUTER_CARD_MODEL)--max-tokens(default: unlimited)--force,--verbose
- Notable behavior:
- Hardened JSON parsing, retries with strict JSON instruction if initial parse fails.
Example:
python make_paper_card.py \
--artifacts-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--verbose --force- Purpose:
- Sequentially plan canonical sections and per-section topics with references and allowed figure IDs.
- Inputs:
--summaries-dircontainingchunk_summaries.jsonl,chunk_index.jsonl,paper_card.json.
- Outputs:
slide_plan.jsonin--outdir.
- Key args:
--model(default viaOPENROUTER_PLANNER_MODEL:qwen/qwen-2.5-7b-instruct,mistralai/mixtral-8x7b-instruct)--max-tokens(default: unlimited)--force,--verbose
- Notable behavior:
- Robust JSON parsing with repairs (unquoted keys,
True/False/Nonenormalization) and a retry with explicit strict-JSON instructions. - Enforces exact
section_title, ≥2references, and filtersfigure_idsto figures only present in referenced chunks. - Loop continues on exceptions instead of aborting the whole plan.
- Robust JSON parsing with repairs (unquoted keys,
Example:
python plan_slides.py \
--summaries-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--verbose --force- Purpose:
- Generate per-slide JSON from
slide_plan.jsonand source chunks. Optionally prepend a cover slide.
- Generate per-slide JSON from
- Inputs:
--artifacts-dircontainingslide_plan.json,chunk_summaries.jsonl,chunk_index.jsonl,paper_card.json- OCR under
--ocr-dir/<pdf-name>/markdown/*
- Outputs:
presentation.jsonin--outdir.
- Key args:
--model(default viaOPENROUTER_GENERATOR_MODEL)--max-tokens(default: unlimited)--figure-reuse-limit(default: -1 unlimited)--add-cover(extract title/authors via LLM and create cover slide)--cover-model(default viaOPENROUTER_COVER_MODEL)--force,--verbose
- Notable behavior:
- JSON-hardening with fallback from
response_format=json_objectto plain text + light repairs; tool_calls fallback. - Guards against mentioning figures when none are attached.
- Enforces unique slide titles by appending “(Slide N)” to duplicates.
- Prefers NOVEL claims; maintains light context (last 2 slides + checkpoints) to avoid drift.
- JSON-hardening with fallback from
Example:
python generate_slides.py \
--ocr-dir mistral_responses \
--pdf-name test_paper \
--artifacts-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--add-cover \
--verbose --force- Purpose:
- Render
presentation.jsonto Marpdeck.mdanddeck.XXX.pngfiles.
- Render
- Inputs:
--presentation-file(required),--output-dir,--paper-name- Optional
--no-coverto avoid adding an extra cover (use when generator already added one) - Optional
--ocr-dir+--cover-modelto build a cover from OCR when not using generator cover
- Outputs:
deck.mdandpngs/deck.XXX.pngin--output-dir.
- Notable behavior:
- Cover extraction via OpenRouter now uncapped (no fixed
max_tokens). - Heuristic fallback to extract title/authors if LLM parsing fails.
- Cover extraction via OpenRouter now uncapped (no fixed
Example:
python generate_slide_pngs.py \
--presentation-file artifacts/test_paper/presentation.json \
--output-dir artifacts/test_paper \
--paper-name test_paper \
--no-cover- Purpose:
- Generate per-slide audio using Sarvam TTS, splitting narration per sentence to avoid API glitches.
- Inputs:
--presentation-file,--output-dir,--paper-name
- Outputs:
audio/slide_XXX.wavor.mp3
Example:
python generate_audio.py \
--presentation-file artifacts/test_paper/presentation.json \
--output-dir artifacts/test_paper/audio \
--paper-name test_paper- Purpose:
- Combine PNGs and WAVs to MP4 via ffmpeg.
- Inputs:
--paper-nameor explicit--png-dirand--audio-dir
- Outputs:
artifacts/<paper>/video.mp4
Example:
python stitch_video.py --paper-name test_paper- Sequence & dependencies: summarizer → paper card → planner → generator → renderer → audio → stitch.
- Idempotency: most scripts support
--forceto overwrite outputs; otherwise they may exit early if outputs exist. - Artifacts co-location: prefer
--outdir artifacts/<paper>for isolation. Pass the same--artifacts-dirto downstream steps. - Cover handling: if generator created a cover (
--add-cover), pass--no-coverto renderer to avoid duplicates. - Figures: generator respects planner figures and drops any not referenced; reuse can be capped with
--figure-reuse-limit. - Token policy: leave
--max-tokensunset for unlimited outputs; set only to constrain costs. - Error handling: planner/generator/card include robust JSON parsing and retries with strict-JSON instructions; planner loop continues on exceptions.
- Models: prefer reliable non-
:freeslugs; you may provide multiple slugs comma-separated for automatic fallback.
# 1) Summaries
python summarize_chunks.py \
--pages-dir mistral_responses/test_paper/markdown \
--outdir artifacts/test_paper \
--force
# 2) Paper card
python make_paper_card.py \
--artifacts-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--force
# 3) Plan
python plan_slides.py \
--summaries-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--force
# 4) Generate slides with cover
python generate_slides.py \
--ocr-dir mistral_responses \
--pdf-name test_paper \
--artifacts-dir artifacts/test_paper \
--outdir artifacts/test_paper \
--add-cover \
--force
# 5) Render PNGs (avoid duplicate cover)
python generate_slide_pngs.py \
--presentation-file artifacts/test_paper/presentation.json \
--output-dir artifacts/test_paper \
--paper-name test_paper \
--no-cover
# 6) Generate audio WAVs/MP3s
python generate_audio.py \
--presentation-file artifacts/test_paper/presentation.json \
--output-dir artifacts/test_paper/audio \
--paper-name test_paper
# 7) Stitch to MP4
python stitch_video.py --paper-name test_paper- Summaries:
wc -l artifacts/<paper>/chunk_summaries.jsonlmatches expectations. - Plan: verify each section has ≥2
references; figure IDs exist in referenced chunks. - Slides: ensure Titles are unique; Content mentions figures only when attached; Bridges are present.
- Renderer: confirm number of PNGs matches slides (plus cover if applicable).
- Audio: number of WAV/MP3s matches slides; lengths reasonable.
- Video: play output MP4; check sync and transitions.