Skip to content

feat(data_parsing): L2D 1 Hz sequential multi-view windows for the World Model (#16)#95

Merged
m-zain-khawaja merged 1 commit into
autowarefoundation:mainfrom
gcordova10:feat/world-model-dataloader
Jun 30, 2026
Merged

feat(data_parsing): L2D 1 Hz sequential multi-view windows for the World Model (#16)#95
m-zain-khawaja merged 1 commit into
autowarefoundation:mainfrom
gcordova10:feat/world-model-dataloader

Conversation

@gcordova10

Copy link
Copy Markdown
Contributor

What this adds

L2DDataset currently loads only the current frame and leaves visual_history
as zeros, with no future frames — so the World Model / JEPA (#13, #85) has neither
history nor targets. This PR adds sequential 1 Hz multi-view windows (the
"Sequential frame access (needed for #13)" checklist item in #16).

Design (testable without lerobot)

  • data_parsing/l2d/world_model_windows.py (new, pure / dataset-agnostic):
    • stride_for_hz(source_hz, wm_hz) — L2D 10 Hz → 1 Hz = stride 10.
    • window_offsets(N, stride) — past [-(N-1)s..0] (oldest→newest, current last)
      and future [s..N*s].
    • required_margins(N, stride) = ((N-1)*s, N*s).
    • build_windows(load_frame, row, ep_start, ep_end, num_frames=4, stride=10)
      (history_frames, future_frames), each [N, V, 3, H, W]; raises IndexError
      if a window would cross the episode boundary. Takes a load_frame(row)->[V,3,H,W]
      callable, so it has no dataset/lerobot dependency and is unit-tested with a
      synthetic loader.
  • data_parsing/l2d/dataset.py (opt-in, default OFF → byte-identical to before):
    • new args include_world_model_windows=False, wm_num_frames=4, wm_hz=1.0, source_hz=10.0;
    • _load_multiview_frame(row) refactor, reused for the current frame and every
      window frame;
    • _build_sample_index margins take the max of the egomotion window (64/64)
      and the World Model windows, so a valid frame always has a complete window;
    • __getitem__ emits history_frames / future_frames when enabled;
      L2DSample gains the two optional keys (NotRequired, version-guarded import
      for py3.10/3.12).

How it fits

1 Hz / 10 Hz

L2D is 10 Hz (egomotion.py, _DT=0.1), so source_hz=10.0 (stride 10).
Parametrized: a 30 FPS source → source_hz=30.

Scope / not in this PR (kept honest)

  • Camera calibration (BEV camera_params): TODO: Dataset Selection and DataLoader Implementation #16 lists it, but I keep this PR to
    the sequential windows. l2d/camera.py already has make_camera_params_placeholder()
    (identity) with a documented TODO to parse intrinsic @ extrinsic → [3,4] per view
    from L2D's extrinsic_RDF.yaml; BEV fusion runs on its learnable pseudo-projection
    meanwhile (not blocking). Follow-up: the real YAML parser.
  • Performance: decoding N×2 multi-view frames per sample is expensive; for real
    training prefer pre-extraction (data_parsing/pre_extracted.py) or caching. The
    functional correctness here is independent.

Tests

pytest Model/tests/test_world_model_windows.py14 passed. ruff + mypy clean.
Diff: 2 new files + 2 edited (dataset.py, __init__.py).

…rld Model (autowarefoundation#16)

Implements the 'sequential frame access (needed for the feature reconstruction
loss autowarefoundation#13)' item of autowarefoundation#16, on the agreed LeRobot L2D dataset.

Problem: L2DDataset loaded only the current frame and returned visual_history as
zeros, so the World Model had no past context and no future targets for the JEPA
loss.

- world_model_windows.py (new, dataset-agnostic + unit-testable without lerobot):
  stride_for_hz (10 Hz -> 1 Hz = stride 10; parametrised for e.g. 30 fps),
  window_offsets, required_margins, build_windows(load_frame, row, ep_start,
  ep_end, N, stride) -> (history_frames, future_frames), each [N, V, 3, H, W],
  oldest->newest, with episode-boundary checks (no cross-episode leakage).
- L2DDataset: opt-in include_world_model_windows (default OFF -> byte-identical).
  Emits history_frames/future_frames [N,7,3,H,W]; valid-index margins take the
  max of egomotion (64/64) and the World-Model window; _load_multiview_frame
  refactored and reused. Feeds train_il's JEPA term directly.
- 14 tests (pure windowing logic, no dataset download); mypy/ruff clean.

Not done (flagged): camera calibration extraction (BEV works via pseudo_projection;
L2D calib API unverified offline) and pre-extraction/caching for the heavy
multi-view video decode — both follow-ups; functional correctness is independent.

Signed-off-by: GABRIELA CORDOVA <100548769@alumnos.uc3m.es>
@gcordova10

gcordova10 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

CI green . This implements the "Sequential frame access (needed for #13)" item from #16: 1 Hz multi-view past/future windows for the World Model.

  • New pure, dataset-agnostic module data_parsing/l2d/world_model_windows.py (stride_for_hz, window_offsets, required_margins, build_windows) — unit-tested without lerobot (14 tests) by injecting a synthetic frame loader.
  • L2DDataset extended opt-in (include_world_model_windows=False by default → unchanged otherwise): emits history_frames / future_frames [N, V, 3, H, W], with episode-boundary checks.

Feeds the JEPA directly: #13's compute_step_loss consumes batch["history_frames"] / ["future_frames"], and #85's World Model uses them as input / targets (they were synthetic placeholders before). Independent of #85 (different area) and doesn't overlap #93.

Kept honest about scope (details in the PR body): BEV camera calibration is left as a documented TODO in l2d/camera.py (BEV runs on the learnable pseudo-projection meanwhile), and pre-extraction/caching for decode throughput is a noted follow-up. Ready for review.

@m-zain-khawaja m-zain-khawaja left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved - thanks @gcordova10

@m-zain-khawaja m-zain-khawaja merged commit ab3651e into autowarefoundation:main Jun 30, 2026
1 check passed
@gcordova10 gcordova10 deleted the feat/world-model-dataloader branch June 30, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants